Digital Content Processing Method for Biometric Identification of Personality Based on Artificial Intelligence Approaches

Digital Content Processing Method for Biometric Identification of Personality Based on Artificial Intelligence Approaches EugeneFedorov fedorovee75@ukr.net Cherkasy State Technological University

Cherkasy Ukraine

E. O. Paton Electric Welding Institute

Kyiv Ukraine

Digital Content Processing Method for Biometric Identification of Personality Based on Artificial Intelligence Approaches E6CACCA4D5A257EAA8714A1A5CD41918 GROBID - A machine learning software for extracting information from scholarly documents digital content processing biometric identification of personality artificial neural network fuzzy inference systems genetic algorithm

The paper suggests a method for processing digital content for biometric identification based on artificial intelligence approaches. To get the goal the methods of forming digital content characteristics, creating a structure model of a system for processing digital content, the method of selecting the structure determination of parameter values of the mathematical model of digital content processing system are suggested. The suggested characterization of digital content automates the processing of digital content which increases the accuracy and speed of determining the values of signs. The suggested creation of a model structure of a digital content processing system provides knowledge in the form of easily accessible for human understanding rules that simplifies the process of determining the structure of the system and also allows parallel processing of information that allows increasing the learning speed. The suggested selection of structure method of determining values of model parameters of the processing system of the digital content based on the genetic algorithm uses a combination of directed and random search that decreases the probability of a hit in local extremum and provides an acceptable speed of determining values of the model parameters. The suggested method of digital content processing for biometric identification of a personality by voice can be used in various intelligent digital content processing systems.

Introduction

Human-machine interfaces are one of the directions of digital content processing. For these interfaces, biometric identification of a person is important.

Automated biometric identification of a person means decision making based on acoustic and visual information, which improves the quality of recognition of the person being studied [1][2][3]. Unlike the traditional approach, computer biometric identification speeds up and improves the accuracy of the recognition process, which is especially critical in limited time conditions.

A special class of biometric identification of a person is formed by methods based on the analysis of acoustic information [4][5][6][7][8].

The methods of biometric identification of a person by voice include: dynamic programming [9,10]; vector quantization [11,12]; artificial neural networks [13,14]; decision tree [15]; Gaussian mixture models (GMM) [16][17][18][19]; their combination [20].

Artificial neural networks are the most popular methods. The advantages of neural networks consist in: the possibility of their training and adaptation; the ability to identify patterns in the data, their generalization, i.e. extracting knowledge from data, therefore, knowledge about the object is not required (for example, its mathematical model); parallel processing of information, which increases the computing power.

The disadvantages of neural networks include: the difficulty of determining the network structure, since there are no algorithms for calculating the number of layers and neurons in each layer for specific applications; the difficulty of forming a representative sample; a high probability of a learning method and adaptation getting into a local extremum; inaccessibility for human understanding of knowledge accumulated by the network (it is impossible to present the relationship between output and output in the form of rules), since it is distributed between all elements of the neural network and is presented in the form of its weighting coefficients.

Recently, neural networks have been combined with fuzzy inference systems. The advantages of fuzzy inference systems are the following: presentation of knowledge in the form of rules that are easily accessible for human understanding; no accurate assessment of variable objects is needed (incomplete and inaccurate data).

The disadvantages of fuzzy inference systems include: the impossibility of their training and adaptation (parameters of the membership functions cannot be automatically configured); the impossibility of parallel processing of information, which increases the computing power.

Since genetic algorithms can be used instead of neural network learning algorithms for training of membership function parameters, we note their advantages and disadvantages.

The advantages of genetic algorithms for neural networks training are the following: the probability of getting into a local extremum decreases.

The disadvantages of genetic algorithms for neural networks training are the following: the speed of the solution search method is lower than that of neural network training methods; in the case of binary genes, an increase in the search space reduces the accuracy of the solution with a constant chromosome length; in the case of binary genes, there are encoding/decoding operations that reduce the speed of the algorithm.

In this regard, it is relevant to create a method of digital content processing for biometric identification of a person, which will eliminate these drawbacks.

The aim of the work is to increase the efficiency of digital content processing system due to the artificial neuro-fuzzy network, which is trained on the basis of the genetic algorithm.

To achieve this goal, it is necessary to solve the following tasks:

1. Generation of digital content attributes.

2. Creation of a model of digital content processing system. 3. Choice of the structure of the method for determining the parameter values of the mathematical model of digital content processing system.

Generation of digital content attributes

The generation of digital content attributes in the case of biometric identification of a person by voice provides for the following steps:

─ determination of vocal segments of a speech signal based on statistical estimation of short-term energies; ─ definition of formants of the central frame of the vocal segment; ─ choice of vocal speech sound attributes based on formants of the central frame of the vocal segment.

Determination of vocal segments of a speech signal based on statistical estimation of short-term energies

The paper proposes a method for determining vocal segments of a speech signal based on statistical estimation of short-term energies, which includes the following steps:

1. Set a speech signal with one vocal sound () yn , 1, f nN  . Set the number of quantization levels of a speech signal L (for an 8-bit sound sample 256 L 

). Set the length of the frame N , on which the short-term energy is calculated, 21 b N  , where the integer parameter b is selected from the inequality

  2 min 1 log s b f f b    , s fN mN E n y m n    , / 2 1, / 2 1 f n N N N     .

Calculate the mathematical expectation of short-term energies; 6.2. If ( ) ( 1) E n T E n T     , then 1 l Nn , go to step 6.1; 6.3. If ( ) ( 1) E n T E n T     , then r Nn  , proceed to completion; 6.4. If 1 f n N N 

  , then go to the next sample, i.e. 1 nn , go to step 6.2, else r Nn  , proceed to completion. As a result, the left and right boundaries of the vocal segment are determined. For the method of formants determining, the frame with the center in the sample with the number

    /2 c l r N round N N 

is selected as the central frame.

Definition of formants of the central frame of the vocal segment

The paper proposes a method for determining the formants of the central frame of the vocal segment based on linear prediction coding, which includes the following steps:

1. Perform through the low-pass filter the balancing of the spectrum having a steep decline in the high frequency region ( ) ( 1) ( )

s m s m s m     , / 2, / 2 cc m N N N N    ,

where  is the filtration parameter, 01   .

Calculate the autocorrelation function ()Rk ( ) ( ) ( ) s m s m w m  , 2 ( ) 0.54 0.46 cos m wm N   , / 2 1 /2 ( ) ( ) ( ) c c N N k m N N R k s m s m k       , 0, kp  , where ()

wm is the Hamming window, p is the linear prediction order, ( /1000) 5 ( /1000)

dd ceil f p ceil f    , ()

ceil f is the function that rounds f to the nearest integer.

3. Calculate the LPC coefficients j a in accordance with the Durbin procedure [21,22]:

3.1. (0) (0) ER  ; 3.2. 1 ( 1)( 1) 1 ( ) ( )i ii ij j k R i R i j E            ; 3.3. () i ii k   ; 3.4. ( ) ( 1)( 1)

,1 1

i i i j j i i j k j i           ; 3.5. ( ) 2( 1)

(1 )

ii i E k E   ; 3.6. 1 ii ; 3.7. if

ip  , then go to step 2; 3.8.

() ,1 p jj a j p     . 4. Calculate the gain coefficient G 1 (0) ( ) p k k G E R a R k      . 5.

Calculate the logarithmic energy spectrum using the gain and LPC coefficients

                              , 0, 1 kN  6.

Calculate the frequency and amplitude of the formant in the logarithmic energy spectrum of the central frame: 6.1. Set frequency number 0 k  . Set the number of formants 0 i  ; 6.2. If 10lg ( ) 10lg ( 1) 10lg ( ) 10lg ( 1) 10lg ( ) 0

W k W k W k W k W k        ,

then fix the formant frequency, i.e.

1 i Fk   , and the formant amplitude, i.e.

10lg ( )i A W k  

, increase the number of local extremums, i.e. 1 ii ; 6.3. If 3 i  , then go to the next frequency, i.e. 1 kk , go to step 6.2.

Choice of vocal speech sound features based on formants of the central frame of the vocal segment

The following vocal speech sound features have been chosen:

─ -the frequency of the first formant

Creation of a model of digital content processing system

The proposed digital content processing system that performs biometric identification of a person by voice is the artificial neuro-fuzzy network, a graph model of which is shown in Fig. 1.

Fig. 1. A graph model of digital content processing system.

The input (zero) layer contains (0) NQ  neurons (corresponds to the number of features). The first hidden layer implements the fuzzification and contains (1) N MQ  neurons (corresponds to the number of values of linguistic variables). The second hidden layer implements the aggregation of subconditions and contains (2) NM  neurons (corresponds to the number of rules M ). The third hidden layer implements the activation of conclusions and contains The output layer implements the defuzzification and contains (5) 1 N  neuron. All weighting coefficients are equal to 1. The creation of the mathematical model of digital content processing system involves the following steps: ─ formation of a fuzzy rule base; ─ fuzzification; ─ aggregation of subconditions; ─ activation of conclusions; ─ aggregation of conclusions; ─ defuzzification.

Formation of a fuzzy rule base

Imagine the j -th fuzzy rule in the form

: j R IF 1 x is 1 j  AND ... AND Q x is j N  THEN y is j  , … … … … … … … … … 1 x N x y 1 M

zz where i x is the name of the input linguistic variable, 1, iN  ; y is the name of the output linguistic variable; j i  is the fuzzy variable (the value of the linguistic variable

i x ), 1, jM  , 1, iQ  ; j

 is the fuzzy variable (the value of the linguistic variable y ), 1, jM  .

The fuzzy set j i A is the range of values of the fuzzy variable j i  , the fuzzy set j B is the range of values of the fuzzy variable j  .

Fuzzification

Let's determine the degree of truth of the i -th subcondition, i.e. let's establish the correspondence between the input variables i x of the j -th rule and the values of the membership function ()

j i i A x  .

Since a number of methods related to person identification by voice use the Gauss function, we choose this function as ()

j i i A x  , i.e. 2 1 ( ) exp 2 j i j ii i j A i xm x            ,

where j i m is the mathematical expectation, j i  is the standard deviation.

Aggregation of subconditions

The membership function of the condition for the j -th rule is defined as 1 1 ( ) ( )... ( )

j j j n n AA A x x x     , 1, jM  .

Activation of conclusions

The membership function of the conclusion for the j -th rule is defined as ( ) ( ) ( )

j j j C A B y x y     , 1, jM  ,     0, 0.5 ( 0.5) 0.5, 0.5 () ( 0.5) 0.5, 0.5 0, 0.5 j B xj x j j x j y j x j x j xj                     

is a triangular function.

Aggregation of subconditions

The membership function of the final conclusion is defined as ( ) max( ( ),..., ( ))

M C CC y y y     .

Defuzzification

To obtain the class number, the membership function maximum method is used.

arg max ( )

j j C z yz  

; j z is the center of the fuzzy set j C .

Thus, the mathematical model of digital content processing system (Fig. 1) can be represented as 1, 1

arg max max ( ) ( )

jj k i Q k i B A jM z i y z x      , 1, kM  .

The determination of the parameters of this system is carried out on the basis of the genetic algorithm.

Choice of the structure of the method for determining parameter values of the mathematical model of digital content processing system

The choice of the structure of the genetic algorithm, which allows to determine parameter values of the mathematical model of digital content processing system, involves the following steps:

─ identification of individuals of the initial population; ─ definition of fitness function; ─ choice of reproduction (selection) operator; ─ choice of crossing-over operator; ─ choice of mutation operator; ─ choice of reduction operator; ─ definition of a stop condition.

Identification of individuals of the initial population

Material genes have been selected for the following reasons:

─ -the ability to search in large spaces, which is difficult to do in the case of binary genes, when an increase in the search space reduces the accuracy of the solution with a constant chromosome length; ─ -the ability to configure solutions locally; ─ -the lack of encoding / decoding operations that are necessary for binary genes increases the speed of the algorithm; ─ -proximity to the formulation of the most applied problems (each material gene is responsible for one variable or parameter, which is impossible in the case of binary genes).

An ordered vector of parameters (mathematical expectations and standard deviations) acts as the chromosome, which represents the i -th individual of the population {} i Hh 

Definition of fitness function

In the paper the following fitness function, which corresponds to the probability of correct identification of a person by voice, is proposed ,

) max

jj ii P pp m p F I y d P      , 1, 0 () 0,

Choice of reproduction (selection) operator

The following effective combination is used to select parameter vectors for crossing and mutation as a reproduction operator

1 1 1 ( ) exp( 1/ ( )) (2 2)(1 exp( 1/ ( ))) | | | | | | 1 i i P h g t a a g t H H H             .

Thus, in the early stages of the genetic algorithm, an uniform selection is used to ensure that the entire search space is studied (random selection of chromosomes), and in the final stages, linearly ordered selection is used to make the search directed (the current best chromosomes are preserved). This combination does not require scaling and can be used to minimize fitness function.

Choice of crossing-over (crossover, recombination) operator

To combine the two options of the vector of parameters selected by the reproduction operator, an uniform crossing-over is used as the crossing-over operator. Parents are selected through the following effective combinationin the early stages of the genetic algorithm, outbreeding is used to provide an investigation of the entire search space, and in the final stages, inbreeding is used to make the search di-rected. This combination does not require scaling and can be used to minimize fitness function.

After the selection of parents, a cross is carried out, and two descendants are produced.

For a global search for the optimal vector of parameters, it is necessary to increase the variety of options.

Choice of mutation operator

To ensure the variety of options for the vector of parameters after crossing-over, an non-uniform mutation is used. The mutation step is defined as P is the initial probability of mutation. Thus, in the early stages of the genetic algorithm, a large step mutation occurs with high probability, which provides an investigation of the entire search space, and in the final stages, the probability of mutation and its step tend to zero, which makes the search directed.

( )1

Choice of reduction operator

The reduction operator allows to create a new population based on the previous population and parameter vectors obtained by crossing-over and mutation. As a reduction operator, a scheme ()   is applied that does not require scaling and can be used to minimize fitness function.

Definition of a stop condition

The following condition is proposed in the work 1 max ( )

i i F h t T      .

The values of  and T are calculated experimentally.

6 .6Determine the left and right borders of the vocal segment: 6.1. Set the sample number 1 n 

11 xF  ; ─ -the frequency of the second formant 22 xF  ; ─ -the frequency of the third formant 33 xF  ; ─ -the amplitude of the first anti-formant 41 xA  ; ─ -the amplitude of the second anti-formant 52 xA  ; ─ -the amplitude of the third anti-formant 63 xA  .The total number of features is denoted as 6 Q  .

response received from the object (person), p y is the response obtained by the model, P is the number of test implementations.

the maximum and minimum values of the j -th gene; t is the iteration number; T is the maximum number of iterations; r is the random number, [0,1] r ; b is the parameter controlling the speed of step decrease, 0 b  . To simulate annealing, the probability of mutation is defined as       () 1 ij j Max h r j ij h Min r  t T T     t  b b     , ,r r 0.5 0.5,where,m P P 0 exp( 1/ ( )) g t ,( ) g t( 1) g t  , 01   ,0 gT  , (0)0 T  , 0where0

jj Max Min are

Numerical research

Table 1 presents the probabilities of a person identification by voice obtained on the basis of TIMIT based on the artificial neural network of the multilayer perceptron type and the proposed method. At the same time, the artificial neural network has had two hidden layers (each has consisted of six neurons, like the input layer).

According to Table 1, the proposed method gives the best results.

Table 1. The probability of biometric identification of a person by voice.

Method Identification probability

Artificial neural network 0.8 Proposed method 0.98

Conclusions

1. To solve the problem of increasing the efficiency of digital content processing system for biometric identification of a person by voice, the corresponding speaker recognition methods have been investigated. These studies have shown that today the use of artificial neural networks in combination with the fuzzy inference system and the genetic algorithm is the most effective method.

2. The proposed method of digital content processing for biometric identification of a person by voice automates the process of generation of digital content features, provides a representation of knowledge in the form of rules that are easily accessible for human understanding, and simplifies the determination of the structure of the model due to the fuzzy inference system; reduces the probability of falling into a local extremum and provides an acceptable speed for determining the parameter values of the model by choosing the effective structure of the genetic algorithm; allows parallel processing of information due to the artificial neural network.

3. As a result of a numerical study, it has been found that the proposed method of digital content processing provides 0,98 probability of biometric identification of a person by voice, which exceeds the probability obtained by the artificial neural network such as a multilayer perceptron.

4. The proposed method of digital content processing for biometric identification of a person by voice can be used in various intelligent systems for digital content processing.

Guide to biometrics RMBolle JConnell SPankanti NKRatha AWSenior 2004 Springer New York Handbook of biometrics Jain, A.K., Flynn, P., Ross, A. 2008 Springer New York, NY TDunstone NYager Biometric system and data analysis: design, evaluation, and data mining

New York

Springer 2009 Applications of Speaker Recognition NSingh RKhan RShree 10.1016/j.proeng.2012.06.363 Procedia Engineering 38 2012 Speaker authentication QLi 2012 Springer-Verlag Berlin Heidelberg; Heidelberg Automatic speech and speaker recognition: large margin and kernel methods JKeshet SBengio 2009 John Wiley & Sons Chichester Self-learning speaker identification: a system for enhanced speech recognition THerbig FGerl WMinker 2013 Springer Berlin Speaker recognition: a tutorial JCampbell 10.1109/5.628714 Proceedings of the IEEE the IEEE 1997 85 An Overview of Speaker Identification: Accuracy and Robustness Issues RTogneri DPullella 10.1109/MCAS.2011.941079 IEEE Circuits and Systems Magazine 11 2011 Fundamentals of speaker recognition HBeigi 2011 Springer New York An overview of automatic speaker recognition technology DAReynolds 10.1109/ICASSP.2002.5745552 IEEE International Conference on Acoustics Speech and Signal Processing 4 2002 An overview of text-independent speaker recognition: From features to supervectors TKinnunen HLi 10.1016/j.specom.2009.08.009 Speech Communication 52 2010 Robust text-independent speaker identification using Gaussian mixture speaker models DReynolds RRose 10.1109/89.365379 IEEE Transactions on Speech and Audio Processing 3 1995 Speaker Recognition based on a Novel Hybrid Algorithm F.-ZZeng HZhou 10.1016/j.proeng.2013.08.007 Procedia Engineering 61 2013 Speech recognition of deaf and hard of hearing people using hybrid neural network CJeyalakshmi VKrishnamurthi ARevathi 10.1109/ICMEE.2010.5558589 2nd International Conference on Mechanical and Electronics Engineering 2010. 2010 Comparison of Text Independent Speaker Identification Systems using GMM and i-Vector Methods PNayana DMathew AThomas 10.1016/j.procs.2017.09.075 Procedia Computer Science 115 2017 Speech to text converter using Gaussian mixture model (GMM) VChauhan ShDwivedi PKarale SMPotdar International Research Journal of Engineering and Technology (IRJET) 3 2016 Automatic speaker recognition using Gaussian mixture speaker models DAReynolds IEEE Transactions on Speech and Audio Processing 3 1995 Method for parametric identification of Gaussian mixture model based on clonal selection algorithm EFedorov VLukashenko TUtkina KRudakov ALukashenko CEUR Workshop Proceedings 2019 2353 Combination of PNN network and DTW method for identification of reserved words, used in aviation during radio negotiation VJLarin EEFedorov 10.3103/S0735272714080044 Radioelectronics and Communications Systems 57 2014 Fundamentals of speech recognition LRRabiner B.-HJuang 2005 Pearson Education Delhi Linear prediction of speech JDMarkel AHGray 1976 Springer-Verlag Berlin