<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Rastorguev S.P. Software methods for protecting information in computers and networks. Moscow:
«Yakhtsmen» Agency Publishing House</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.1145/3313831.3376770</article-id>
      <title-group>
        <article-title>Method of User Authentication by Keyboard Handwriting based on Neural Networks and Genetic Algorithm</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Andrii Pryimak</string-name>
          <email>andrii.pryimak@live.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yurii Yaremchuk</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Olha Salieva</string-name>
          <email>salieva8257@gmail.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vasyl Karpinets</string-name>
          <email>karpinets@gmail.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nataliia Kunanets</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Lviv Polytechnic National University</institution>
          ,
          <addr-line>12 Bandera street, Lviv, 79013</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Vinnytsia National Technical University</institution>
          ,
          <addr-line>Khmelnytsky highway 95, Vinnytsia, 21000</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>1993</year>
      </pub-date>
      <volume>188</volume>
      <issue>4</issue>
      <fpage>0000</fpage>
      <lpage>0003</lpage>
      <abstract>
        <p>A method of user authentication based on keyboard handwriting with error injection was proposed. It is based on a two-level neural network architecture using five-time functions and built-in sigmoid activation function to increase the efficiency of the neural network. An error code injection was also introduced, which allowed to collect more accurate data on human handwriting and increase the accuracy of correct recognition of the user and his successful authentication by 3-11% compared to existing methods. The use of a hash function based on a genetic algorithm is proposed, which provides the security of storing a code word in the database. Information security, user authentication, neural network, keyboard handwriting, genetic International Workshop of IT-professionals on Artificial Intelligence (ProfIT AI 2021), September 20-21, 2021, Kharkiv, Ukraine ORCID: 0000-0001-9695-0462 (A. Pryimak); 0000-0002-6303-7703 (Y. Yaremchuk); 0000-0003-2388-7321 (O. Salieva); 0000-0001-8148-</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Given the rapid pace of development of information technology, increasing the number of
information threats, the degree of uncertainty of their origin and implementation, as well as the
complexity of information security systems and their specialized focus, the task of building an
information security system becomes relevant. One of the methods of information protection is user
authentication. User authentication is the verification that the user being authentication is who he or she
claims to be [1]. For correct user authentication, it is necessary for the user to present some unique
information, which should be owned only by him and no one else.</p>
      <p>Traditional methods of identification and authentication, based on the use of smart cards, USB keys,
electronic keys or other portable identifiers, as well as passwords and access codes, have significant
disadvantages, such as: the possibility of stealing the item from the user; the need for special equipment
for working with magnetic cards, smart cards and others; the ability to make a copy of a unique item.
In general, the main disadvantage of such methods is not always reliable authentication [2]. This
shortcoming can be eliminated by using biometric authentication methods, such as the dynamics of
keystrokes by the user. Biometric characteristics are an integral part of human beings and therefore
cannot be falsified, lost or forgotten.</p>
      <p>Keystroke dynamics, which represent the typing rhythms that the user performs while typing on the
keyboard, provide a high level of security and also have advantages in practical application, as
inexpensive implementation of this method is an important indicator compared to scanning fingerprints
or irises eyelids that require additional equipment to achieve authentication [3].</p>
      <p>Another modern approach to solving the problem of authentication is the use of neural networks.
There are a number of architectures that have already become classic - maximum search network, input</p>
      <p>2021 Copyright for this paper by its authors.
and output star, single-layer perspective, BSP, network with radial basis function (RBF), Hopfield
network, Hemming, Cosco, McCulloch-Pitts, Kohonen, Grosnign, and Ymovir APT network. Thus for
each class of applied problems the architecture of a neural network is used. Applying a neural network
approach to the authentication problem can improve the accuracy of user authentication, because this
approach has the property of filtering random interference present in the input data, which allows to
abandon the algorithms for smoothing experimental dependencies required for statistical data
processing [4].</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related works</title>
      <p>Methods of user authentication by keyboard handwriting have been practiced relatively recently. In
their work Leggett, Umphress and Williams [5] conducted an experiment with 17 programmers. They
used measured keystrokes, known as digraphs. The first set consisted of 1400 characters and was used
at the learning stage, the second set consisted of 300 characters and was used for verification. In his
report, Leggett specified an authentication probability of 89.5%. In their experiments, the authors
suggested a possible deviation of the mean retention time of bigram equal to 0.5, the user was
considered recognizable if 60% or more of the time delay coincided with the allowable deviation of the
sample.</p>
      <p>Other work in this area was carried out by Garris, Young and Hammon [6]. In their proposed
approach, a matrix of changes in the associated delay vectors was used as a quantity (parameter) that
contains data on individual handwriting. The Mahalanobis distance function was then used to determine
the similarity between the handwriting identified and the user profile. Unlike others, Young and
Hammon used the Euclidean distance between two vectors to compare the number of attributes.</p>
      <p>Another well-known work was presented by Rastorguev [7]. In his monograph, the author divided
the authentication procedure into two types. The first type is password authentication, where the user
goes through the authentication procedure by password, the second type is the authentication of users
by a set of random phrases. He also singled out two modes of the authentication procedure: the system
setup procedure and the authentication procedure.</p>
      <p>When identifying users by typing free text in the mode of setting up the authentication system, the
keyboard was divided into four parts. When the user worked on the keyboard, the time intervals between
these four parts were calculated, regardless of which key was pressed in these parts. In the authentication
mode, the current values were compared with the reference and the system made decisions. The main
assumption was that the distribution of temporal characteristics of users seemed to be a normal Gaussian
law. Algorithms for excluding gross errors, system settings and authentication were also presented in
the paper.</p>
      <p>Another paper was presented by Saket Maheshwari and Vikram Pudi [8], who propose a method for
identifying a user using keyboard handwriting based on a five-level neural network. The authors
investigated in detail the possibility of using three different methods of building a neural network
architecture (exclusion method, rectification method, packet normalization method). Studies have
shown that the maximum accuracy of user identification is 85.22-93.59%. However, it should be noted
that the best accuracy result was achieved by using a five-level neural network, which significantly
increases the learning time of this network (up to 9 minutes) per user.</p>
      <p>Nura Hanura's [9] work focuses on using the time interval between keystrokes as a feature of a set
of individual characters to identify authentic users. A four-level neural network with a multilinear
perceptron (MLP) with a built-in error propagation method (BP) is used to train and test functions. The
results of this study showed that the accuracy of user identification is in the range of 90-92%.</p>
      <p>In the previous work of the authors "Method of user identification by keyboard handwriting based
on neural networks" [10] a study of the possibility of using a neural network, which is a two-layer
system of direct access to a network with 70 sigmoid hidden neurons and 10 sigmoid source neurons.
The obtained indicators of user identification accuracy using the proposed method are 88-93%.</p>
      <p>Most of the considered works are based on geometrical methods of recognition, using various degree
of closeness between the handwriting sample and its standard (Euclidean, Mahalanobis, etc.). The
maximum found probability of authentication of such systems is 90%. Methods based on the use of
neural networks have higher accuracy (85.22-93.59%), but it should be noted that the research was
conducted using multilevel neural networks, which significantly affected the speed of their learning.</p>
      <p>The results of comparing the accuracy of user authentication by existing methods are presented in
the Table 1 below.</p>
      <p>Based on the analysis of existing methods of user identification by keyboard handwriting, it is seen
that the accuracy of identification is in the range from 85.22% to 93.59%, so it remains important to
increase the accuracy of identification and development of the appropriate method.</p>
      <p>Therefore, it makes sense to pay attention to the injection of an error in the word to verify the user,
which can allow to collect more accurate data on human handwriting and reduce the risk of forgery. It
can also be borne in mind that a person's keyboard handwriting, like normal handwriting, can change,
leading to incorrect validation methods, and the user's reaction to the error remains relatively constant,
regardless of changes in speed or correct character set by the user. To increase the security of codeword
storage, it can be considered using a genetic approach to generate a hash function of the word.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Problem Statement</title>
      <p>To study the possibility of using a neural network and a hash function based on a genetic algorithm
to improve the accuracy of user authentication based on keyboard handwriting with error injection, as
well as to propose a method based on this mathematical apparatus. Also compare the proposed method
of authentication with existing ones.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Proposed method</title>
      <p>It is proposed to use the injection in the form of error generation to collect the necessary information
on the user's keyboard handwriting to verify the correctness of the entered word by the user, as well as
to use the hash function based on a genetic algorithm to increase the security of code word storage. The
neural network architecture, which is a two-tier system of direct access to a network with 70 sigmoid
hidden neurons and 10 sigmoid output neurons, using the sigmoid activation function, was chosen to
study keyboard writing directly.</p>
      <p>The proposed method of user authentication (Figure 1) consists of the following steps:
1. Entering a code word by the user (code word pre-stored in the database as a hash with a use of
genetic algorithm -  1).
2. Generate a random word character number to be changed.
3. Generate a random number of the word symbol to which the symbol from the previous step will
be changed.
4. Replacing two characters with places - creating an error in the code word.
5. The user makes corrections in the word. Based on its correction, a hash is generated with a use
of genetic algorithm -  2.
6. Comparison of two hashes with each other. If they are the same, then the user has made a fix
and the method continues to work. If the hashes do not match, the user returns to step 5.
7. User identification based on a trained neural network.
8. If all data converges, the user successfully passed authentication process, if not, he returns to
step 1.</p>
      <p>This method uses a hash function, the feature of which is that it is based on a genetic algorithm, the
main operators of which are crossover on the worst gene, mutation of the two worst genes and fitness
function, which uses analysis based on five statistical tests (monobit test, poker test, start test, longruns
test and autocorrelation test). Using this approach to generating a hash function allows you to increase
the security of code word storage.</p>
      <p>The stage of user authentication based on a trained neural network includes nine main stages of
collecting information using time functions and its further processing. The main stages are: collection
of all necessary data; data preparation and normalization; operation of synchronization functions; basic
component analysis; automatic selection of learning parameters; network training; checking the
correctness of training; adjustment of parameters; readiness for further use.</p>
      <p>No</p>
      <p>No</p>
      <p>Begin</p>
      <p>Code word entry and word hash generation
Generate a random word character number to be</p>
      <p>changed
Generate a random number of the word character to
which the character from the previous step will be</p>
      <p>changed</p>
      <p>Replace two characters
The user makes corrections in the word</p>
      <p>and word hash generation</p>
      <p>Checking hashes for equality
User identification based on a trained neural network</p>
      <p>Verification of the received user data</p>
      <p>Authentication has occurred</p>
      <p>End</p>
      <p>Yes
Yes</p>
      <p>To collect information, the proposed method contains five-time functions (delay time, up-down
delay, down-down delay, up-up delay, total time) that collect the information needed for comparison
and identification of the user by the neural network. These are the five-time functions:
1. Delay time - the time of keystroke, which is determined by the following equation
 =   −   , (1)
where   - release time of the  -th key;   is the time of pressing the  -th key.
 =  ( ),  = ∑     +  0 0,</p>
      <p>=1
where  ( ) - activation function;  - induced local field;   - entrance weight;   - signal at the
input of the neuron;  0 - additional entrance;  0 - the weight corresponding to it.</p>
      <p>Let the number of input parameters be two. To begin with, it is necessary to investigate one hidden
layer of neurons. The number of elements on it should be determined using the
Arnold-KolmogorovHecht Nelson formula</p>
      <p>1+ lNogyQ2(Q)  Nw  N y  NQx +1 ( Nx + N y +1) + N y (7)
where   is the dimension of the output signal;  is the number of elements of the set of educational
examples;   - the required number of synaptic connections;   - the dimension of the input signal.</p>
      <p>After performing the calculations, we can conclude that the required number of synaptic connections
is in the range of 7 &lt;   &lt; 20. To find out the number of required neurons in the hidden layer, you
must use the formula
(2)
(3)
(6)
(8)</p>
      <p>«Up-down» delay - the time difference between releasing a key and pressing the next key
 =   +1 −   ,
where   - release time of the  -th key;   is the time of pressing the  -th key.
«Down-down» delay - the time difference between releasing the same key twice</p>
      <p>=   +1 −   ,
where   is the time of pressing the  -th key.</p>
      <p>=</p>
      <p>.</p>
      <p>+</p>
      <p>Thus, the number of neurons in the hidden layer will be in the range of 1 &lt;  &lt; 70. To study the
entire range, you need to select the number of neurons in the hidden layer at which the learning error
will be less. In this case, it would be advisable to select 70 neurons on the hidden layer.</p>
      <p>You also need to select activation functions for each layer. Neurons of the input and output layers
are responsible only for data input and output, their functions can be left linear. The main calculated
load falls on the neurons of the hidden layer, so its activation function should be made sigmoid.</p>
      <p>In direct propagation neural networks, synaptic connections are organized in such a way that each
neuron in a given level of the hierarchy receives information only from some non-empty set of neurons
that are located at a lower level. The name of the networks indicates that they have a dedicated direction
of propagation of signals that move from the input through one or more hidden layers to the output
layer. It is easy to see that a multilayer neural network can be obtained by cascading single-layer
networks with weights matrices</p>
      <p>1,  2, … ,   , (9)
where  is the number of layers of the neural network.</p>
      <p>In the case of linearity of activation functions, a multilayer neural network can be reduced to an
equivalent single layer with a matrix of weights  =  1 ∗  2 ∗ … ∗   , so the formation of such
structures makes sense only if nonlinear activation functions are used in neurons.</p>
      <p>A neural network is proposed and presented in Figure 2, which is a two-layer system of direct access
to a network with 70 sigmoid hidden neurons and 10 sigmoid output neurons.</p>
      <sec id="sec-4-1">
        <title>Input</title>
        <p>2
w
b</p>
      </sec>
      <sec id="sec-4-2">
        <title>Hidden</title>
      </sec>
      <sec id="sec-4-3">
        <title>Output</title>
      </sec>
      <sec id="sec-4-4">
        <title>Output 10</title>
        <p>A detailed diagram of the proposed neural network is presented in Figure 3.
The input for the network will be the key hold time and the time intervals between keystrokes.</p>
        <sec id="sec-4-4-1">
          <title>Training is carried out as follows:</title>
          <p>1.</p>
          <p>All weights of the network are randomized to small values.
2. The input training vector  is fed to the network input and the 
is calculated using the standard expression
signal from each neuron
   = ∑  .</p>
          <p>= 
  −</p>
          <p>.
  ( + 1) =   ( ) +   
  .</p>
          <p>calculated.
3. The value of the activation threshold function for the 
signal from each neuron is
4. The error for each neuron is calculated by subtracting the output from the desired output
5. Each weight is modified as follows
6. Repeat steps two through five until the error is small enough.
(10)
(11)
(12)
x1
x2
z1
z2
z3
z3</p>
          <p>The results of the accuracy of the user recognition as well as comparison with existing methods are
presented in the next chapter of this work.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Results and discussion</title>
      <p>Let us estimate the probability  of correctly recognizing the user by his frequency in  independent
experiments. With the help of the developed software, we will conduct an experiment. To do this, 5
users consistently entered their code word 100 times. The number of access denials is indicated and
shown in Table 2.
Data on the number of denials of access for a real user
To check the applicability of the normal distribution law, the values of 
and 
are estimated.</p>
      <sec id="sec-5-1">
        <title>Assuming that</title>
        <p>≈  ∗ we obtain:
where  ∗ - average access frequency.</p>
        <p>The obtained values give grounds to believe that in this case the normal distribution law can be
applied. According to the tables, we find   = 1.652 for  = 0,9. Next, calculate  1 and  2 by the
following formulas:
6
3
3
7
5
 1 =
 1 =
 2 =
 2 =
access
1   2 −   √</p>
        <p>∗(1 −  ∗)
Thus, the probability of correct user recognition is in the range from 92% to 98%.</p>
        <p>A comparison of the proposed method with the existing methods (Figure 4) showed that the proposed
method has better accuracy by 2.5-8.5% than the accuracy of the method of Leggett, Umphress and
Williams, by 2-8% better accuracy than the method of Rastorguev, by 4.41-6.78% better than the
method of Maheshwari and Pudi and 2-6% better than the method of Hanura. The performance of the
recognition accuracy of the method proposed in the previous work was also improved by 4-5%.</p>
        <p>So the scientific novelty of the work is the proposed method of user authentication by keyboard
handwriting based on neural network and genetic algorithm, the feature of which is the use of a neural
network in the form of a two-level system of direct access to a network with 70 sigmoid hidden neurons,
10 sigmoid source neurons and sigmoid activation function, the use of error injection into the code
word, and the use of a hash function based on a genetic algorithm with crossover on the worst gene,
mutation of the two worst genes and fitness function, which uses analysis based on five statistical tests.
This allowed to increase the recognition accuracy of the user's keyboard handwriting to 92-98%, which
is better by 3-11% compared to existing methods.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusions</title>
      <p>An experimental study of the possibility of using a neural network and a genetic algorithm to
improve the accuracy of user identification based on keyboard handwriting with error injection was
made and as a result a method of user authentication was proposed, which is based on a two-level neural
network architecture using five-time functions and built-in sigmoid activation function to increase the
efficiency of the neural network. An error injection was also introduced, which allowed to collect more
accurate data on human handwriting and increase the accuracy of correct recognition of the user and
his successful authentication by 3-11% compared to existing methods.</p>
      <p>The use of a hash function based on a genetic algorithm is proposed, which is aimed at increasing
the security of storing a code word in the database and the impossibility of making any changes to it,
as during the authentication process the code word is not compared by itself, but the hash values.
7. References
[1] Gavan Leonard Tredoux, Steven J. Harrington. Method and system for providing authentication
through aggregate analysis of behavioral and time patterns. Xerox Corporation, Norwalk, CT,
2016.
[2] El-Hajj, M., Chamoun, M., Fadlallah, A., &amp; Serhrouchni, A. “Analysis of authentication
techniques in Internet of Things (IoT).” In: 2017 1st Cyber Security in Networking Conference
(CSNet). IEEE, 2017, pp. 1-3. doi: 10.1109/CSNET.2017.8242006.</p>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>