=Paper=
{{Paper
|id=Vol-2654/paper57
|storemode=property
|title=Digital Content Processing Method for Biometric Identification of Personality Based on Artificial Intelligence Approaches
|pdfUrl=https://ceur-ws.org/Vol-2654/paper57.pdf
|volume=Vol-2654
|authors=Eugene Fedorov,Tetyana Utkina,Kostiantyn Rudakov,Andriy Lukashenko,Serhii Mitsenko,Maryna Chychuzhko,Valentyna Lukashenko
|dblpUrl=https://dblp.org/rec/conf/cybhyg/FedorovURLMCL19
}}
==Digital Content Processing Method for Biometric Identification of Personality Based on Artificial Intelligence Approaches==
Digital Content Processing Method for
Biometric Identification of Personality Based on Artificial
Intelligence Approaches
Eugene Fedorov1[0000-0003-3841-7373], Tetyana Utkina1[0000-0002-6614-4133],
Kostiantyn Rudakov1[0000-0003-0000-6077], Andriy Lukashenko2[0000-0002-6016-1899],
Serhii Mitsenko1[0000-0002-9582-7486], Maryna Chychuzhko1[0000-0001-5329-7897],
Valentyna Lukashenko1[0000-0002-6749-9040]
1
Cherkasy State Technological University, Cherkasy, Ukraine
{t.utkina, ckc, k.rudakov, s.mitsenko,
m.chychuzhko}@chdtu.edu.ua, fedorovee75@ukr.net
2
E. O. Paton Electric Welding Institute, Kyiv, Ukraine
ineks-kiev@ukr.net
Abstract. The paper suggests a method for processing digital content for bio-
metric identification based on artificial intelligence approaches. To get the goal
the methods of forming digital content characteristics, creating a structure mod-
el of a system for processing digital content, the method of selecting the struc-
ture determination of parameter values of the mathematical model of digital
content processing system are suggested. The suggested characterization of dig-
ital content automates the processing of digital content which increases the ac-
curacy and speed of determining the values of signs. The suggested creation of
a model structure of a digital content processing system provides knowledge in
the form of easily accessible for human understanding rules that simplifies the
process of determining the structure of the system and also allows parallel pro-
cessing of information that allows increasing the learning speed. The suggested
selection of structure method of determining values of model parameters of the
processing system of the digital content based on the genetic algorithm uses a
combination of directed and random search that decreases the probability of a
hit in local extremum and provides an acceptable speed of determining values
of the model parameters. The suggested method of digital content processing
for biometric identification of a personality by voice can be used in various in-
telligent digital content processing systems.
Keywords: digital content processing, biometric identification of personality,
artificial neural network, fuzzy inference systems, genetic algorithm.
1 Introduction
Human-machine interfaces are one of the directions of digital content processing. For
these interfaces, biometric identification of a person is important.
Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons Li-
cense Attribution 4.0 International (CC BY 4.0). CybHyg-2019: International Workshop on
Cyber Hygiene, Kyiv, Ukraine, November 30, 2019.
Automated biometric identification of a person means decision making based on
acoustic and visual information, which improves the quality of recognition of the
person being studied [1-3]. Unlike the traditional approach, computer biometric iden-
tification speeds up and improves the accuracy of the recognition process, which is
especially critical in limited time conditions.
A special class of biometric identification of a person is formed by methods based
on the analysis of acoustic information [4-8].
The methods of biometric identification of a person by voice include: dynamic
programming [9, 10]; vector quantization [11, 12]; artificial neural networks [13, 14];
decision tree [15]; Gaussian mixture models (GMM) [16-19]; their combination [20].
Artificial neural networks are the most popular methods.
The advantages of neural networks consist in: the possibility of their training and
adaptation; the ability to identify patterns in the data, their generalization, i.e. extract-
ing knowledge from data, therefore, knowledge about the object is not required (for
example, its mathematical model); parallel processing of information, which increases
the computing power.
The disadvantages of neural networks include: the difficulty of determining the
network structure, since there are no algorithms for calculating the number of layers
and neurons in each layer for specific applications; the difficulty of forming a repre-
sentative sample; a high probability of a learning method and adaptation getting into a
local extremum; inaccessibility for human understanding of knowledge accumulated
by the network (it is impossible to present the relationship between output and output
in the form of rules), since it is distributed between all elements of the neural network
and is presented in the form of its weighting coefficients.
Recently, neural networks have been combined with fuzzy inference systems.
The advantages of fuzzy inference systems are the following: presentation of
knowledge in the form of rules that are easily accessible for human understanding; no
accurate assessment of variable objects is needed (incomplete and inaccurate data).
The disadvantages of fuzzy inference systems include: the impossibility of their
training and adaptation (parameters of the membership functions cannot be automati-
cally configured); the impossibility of parallel processing of information, which in-
creases the computing power.
Since genetic algorithms can be used instead of neural network learning algorithms
for training of membership function parameters, we note their advantages and disad-
vantages.
The advantages of genetic algorithms for neural networks training are the follow-
ing: the probability of getting into a local extremum decreases.
The disadvantages of genetic algorithms for neural networks training are the fol-
lowing: the speed of the solution search method is lower than that of neural network
training methods; in the case of binary genes, an increase in the search space reduces
the accuracy of the solution with a constant chromosome length; in the case of binary
genes, there are encoding/decoding operations that reduce the speed of the algorithm.
In this regard, it is relevant to create a method of digital content processing for bi-
ometric identification of a person, which will eliminate these drawbacks.
The aim of the work is to increase the efficiency of digital content processing sys-
tem due to the artificial neuro-fuzzy network, which is trained on the basis of the
genetic algorithm.
To achieve this goal, it is necessary to solve the following tasks:
1. Generation of digital content attributes.
2. Creation of a model of digital content processing system.
3. Choice of the structure of the method for determining the parameter values of the
mathematical model of digital content processing system.
2 Generation of digital content attributes
The generation of digital content attributes in the case of biometric identification of a
person by voice provides for the following steps:
─ determination of vocal segments of a speech signal based on statistical estimation
of short-term energies;
─ definition of formants of the central frame of the vocal segment;
─ choice of vocal speech sound attributes based on formants of the central frame of
the vocal segment.
2.1 Determination of vocal segments of a speech signal based on statistical
estimation of short-term energies
The paper proposes a method for determining vocal segments of a speech signal based
on statistical estimation of short-term energies, which includes the following steps:
1. Set a speech signal with one vocal sound y (n) , n 1, N f . Set the number of
quantization levels of a speech signal L (for an 8-bit sound sample L 256 ). Set
the length of the frame N , on which the short-term energy is calculated,
N 2b 1 , where the integer parameter b is selected from the inequality
b 1 log2 f s f min b , f s is the sampling frequency of the speech signal in Hz,
8000 f s 22050 , f min is the minimum frequency of the fundamental human
tone in Hz, f min 50 . Set the parameter for adaptive threshold , 0 1 .
2. Calculate short-term energies
N /2
E (n) y (m n) , n N / 2 1, N N / 2 1 .
m N / 2
2 f
3. Calculate the mathematical expectation of short-term energies
N N / 2 1
f
1
E ( n) .
N N 1 n N / 2 1
f
4. Calculate the standard deviation of short-term energies
N N / 2 1 f
1
N N 1 n N / 2 1
f
E 2 (n) 2 .
5. Calculate the adaptive threshold T .
6. Determine the left and right borders of the vocal segment:
6.1. Set the sample number n 1 ;
6.2. If E (n) T E (n 1) T , then N l n 1 , go to step 6.1;
6.3. If E (n) T E (n 1) T , then N r n , proceed to completion;
6.4. If n N f N 1 , then go to the next sample, i.e. n n 1 , go to step 6.2,
else N r n , proceed to completion.
As a result, the left and right boundaries of the vocal segment are determined. For
the method of formants determining, the frame with the center in the sample with the
number N c round N N / 2 is selected as the central frame.
l r
2.2 Definition of formants of the central frame of the vocal segment
The paper proposes a method for determining the formants of the central frame of the
vocal segment based on linear prediction coding, which includes the following steps:
1. Perform through the low-pass filter the balancing of the spectrum having a steep
decline in the high frequency region
s (m) s(m 1) s(m) , m N c N / 2, N c N / 2 ,
where is the filtration parameter, 0 1 .
2. Calculate the autocorrelation function R(k )
2 m
s (m) s (m)w(m) , w(m) 0.54 0.46 cos ,
N
N c N / 2 1 k
R(k ) s (m) s (m k ) , k 0, p ,
m N c N / 2
where w(m) is the Hamming window, p is the linear prediction order,
ceil ( f d /1000) p 5 ceil ( f d /1000) , ceil ( f ) is the function that rounds f to the
nearest integer.
3. Calculate the LPC coefficients a j in accordance with the Durbin procedure
[21, 22]:
3.1. E (0) R(0) ;
i 1
3.2. ki R(i ) (ji 1) R(i j ) E (i 1) ;
j 1
3.3. i(i ) ki ;
3.4. (ji ) (ji 1) kii(i j1) ,1 j i 1 ;
3.5. E (i ) (1 ki2 ) E (i 1) ;
3.6. i i 1 ;
3.7. if i p , then go to step 2;
3.8. a j (j p ) ,1 j p .
4. Calculate the gain coefficient G
p
G E R(0) ak R(k ) .
k 1
5. Calculate the logarithmic energy spectrum using the gain and LPC coefficients
G2
10 lg W (k ) 10 lg 2 2
, k 0, N 1
p
2
p
2
1 am cos km am sin km
m 1 N m 1 N
6. Calculate the frequency and amplitude of the formant in the logarithmic energy
spectrum of the central frame:
6.1. Set frequency number k 0 . Set the number of formants i 0 ;
6.2. If 10lg W (k ) 10lg W (k 1) 10lg W (k ) 10lg W (k 1) 10lg W (k ) 0 ,
then fix the formant frequency, i.e. Fi 1 k , and the formant amplitude, i.e.
Ai 1 10lg W (k ) , increase the number of local extremums, i.e. i i 1 ;
6.3. If i 3 , then go to the next frequency, i.e. k k 1 , go to step 6.2.
2.3 Choice of vocal speech sound features based on formants of the central
frame of the vocal segment
The following vocal speech sound features have been chosen:
─ - the frequency of the first formant x1 F1 ;
─ - the frequency of the second formant x2 F2 ;
─ - the frequency of the third formant x3 F3 ;
─ - the amplitude of the first anti-formant x4 A1 ;
─ - the amplitude of the second anti-formant x5 A2 ;
─ - the amplitude of the third anti-formant x6 A3 .
The total number of features is denoted as Q 6 .
3 Creation of a model of digital content processing system
The proposed digital content processing system that performs biometric identification
of a person by voice is the artificial neuro-fuzzy network, a graph model of which is
shown in Fig. 1.
z1 z M
x1
… …
y
… … … … …
xN
… …
Fig. 1. A graph model of digital content processing system.
The input (zero) layer contains N (0) Q neurons (corresponds to the number of
features). The first hidden layer implements the fuzzification and contains N (1) MQ
neurons (corresponds to the number of values of linguistic variables). The second
hidden layer implements the aggregation of subconditions and contains N (2) M
neurons (corresponds to the number of rules M ). The third hidden layer implements
the activation of conclusions and contains N (3) M 2 neurons. The fourth hidden
layer implements the aggregation of conclusions and contains N (4) M neurons.
The output layer implements the defuzzification and contains N (5) 1 neuron.
All weighting coefficients are equal to 1.
The creation of the mathematical model of digital content processing system in-
volves the following steps:
─ formation of a fuzzy rule base;
─ fuzzification;
─ aggregation of subconditions;
─ activation of conclusions;
─ aggregation of conclusions;
─ defuzzification.
3.1 Formation of a fuzzy rule base
Imagine the j -th fuzzy rule in the form
R j : IF x1 is 1j AND ... AND xQ is Nj THEN y is j ,
where xi is the name of the input linguistic variable, i 1, N ; y is the name of the
output linguistic variable; i j is the fuzzy variable (the value of the linguistic variable
xi ), j 1, M , i 1, Q ; j is the fuzzy variable (the value of the linguistic variable
y ), j 1, M .
The fuzzy set Ai j is the range of values of the fuzzy variable i j , the fuzzy set B j
is the range of values of the fuzzy variable j .
3.2 Fuzzification
Let’s determine the degree of truth of the i -th subcondition, i.e. let’s establish the
correspondence between the input variables xi of the j -th rule and the values of the
membership function A j ( xi ) .
i
Since a number of methods related to person identification by voice use the Gauss
function, we choose this function as A j ( xi ) , i.e.
i
1 x m j 2
A j ( xi ) exp i i
,
2 i
i
j
where mij is the mathematical expectation, i j is the standard deviation.
3.3 Aggregation of subconditions
The membership function of the condition for the j -th rule is defined as
A j ( x ) A j ( x1 )... A j ( xn ) , j 1, M .
1 n
3.4 Activation of conclusions
The membership function of the conclusion for the j -th rule is defined as
C j ( y) A j ( x )B j ( y) , j 1, M ,
0, x j 0.5
x ( j 0.5) 0.5, j 0.5 x j
B j ( y) is a triangular function.
( j 0.5) x 0.5, j x j 0.5
0, x j 0.5
3.5 Aggregation of subconditions
The membership function of the final conclusion is defined as
C ( y) max(C1 ( y),..., C M ( y)) .
3.6 Defuzzification
To obtain the class number, the membership function maximum method is used.
y arg max
j
C ( z j ) ; z j is the center of the fuzzy set C j .
z
Thus, the mathematical model of digital content processing system (Fig. 1) can be
represented as
Q
y arg max max B j ( z k ) A j ( xi ) , k 1, M .
k
z j1, M i
i 1
The determination of the parameters of this system is carried out on the basis of the
genetic algorithm.
4 Choice of the structure of the method for determining
parameter values of the mathematical model of digital
content processing system
The choice of the structure of the genetic algorithm, which allows to determine pa-
rameter values of the mathematical model of digital content processing system, in-
volves the following steps:
─ identification of individuals of the initial population;
─ definition of fitness function;
─ choice of reproduction (selection) operator;
─ choice of crossing-over operator;
─ choice of mutation operator;
─ choice of reduction operator;
─ definition of a stop condition.
4.1 Identification of individuals of the initial population
Material genes have been selected for the following reasons:
─ - the ability to search in large spaces, which is difficult to do in the case of binary
genes, when an increase in the search space reduces the accuracy of the solution
with a constant chromosome length;
─ - the ability to configure solutions locally;
─ - the lack of encoding / decoding operations that are necessary for binary genes
increases the speed of the algorithm;
─ - proximity to the formulation of the most applied problems (each material gene is
responsible for one variable or parameter, which is impossible in the case of binary
genes).
An ordered vector of parameters (mathematical expectations and standard devia-
tions) acts as the chromosome, which represents the i -th individual of the population
H {hi }
hi (lx11 i * m11 , lx12 i * m12 ,..., lx1n i * m1n , lxn2 i * mn2 ,
lx11 i * 11 , lx12 i * 12 ,..., lx1n i * n1 , lxn2 i * n2 ) , i 1,| H | ,
rxkj lxkj rx j lxkj
mkj , kj k , j 1, M ,
|H| |H|
where | H | is the population power, lxkj , rxkj are the left and right boundaries of the
values of the k -th feature, calculated experimentally.
4.2 Definition of fitness function
In the paper the following fitness function, which corresponds to the probability of
correct identification of a person by voice, is proposed
1 P 1, a 0
F I ( y p d p ) max
P p 1 mij , ij
, I (a)
0, a 0
,
where d p is the response received from the object (person), y p is the response ob-
tained by the model, P is the number of test implementations.
4.3 Choice of reproduction (selection) operator
The following effective combination is used to select parameter vectors for crossing
and mutation as a reproduction operator
1 1 i 1
P(hi ) exp( 1/ g (t )) a (2a 2) (1 exp( 1/ g (t ))) .
|H| | H | | H | 1
Thus, in the early stages of the genetic algorithm, an uniform selection is used to
ensure that the entire search space is studied (random selection of chromosomes), and
in the final stages, linearly ordered selection is used to make the search directed (the
current best chromosomes are preserved). This combination does not require scaling
and can be used to minimize fitness function.
4.4 Choice of crossing-over (crossover, recombination) operator
To combine the two options of the vector of parameters selected by the reproduction
operator, an uniform crossing-over is used as the crossing-over operator.
Parents are selected through the following effective combination – in the early
stages of the genetic algorithm, outbreeding is used to provide an investigation of the
entire search space, and in the final stages, inbreeding is used to make the search di-
rected. This combination does not require scaling and can be used to minimize fitness
function.
After the selection of parents, a cross is carried out, and two descendants are pro-
duced.
For a global search for the optimal vector of parameters, it is necessary to increase
the variety of options.
4.5 Choice of mutation operator
To ensure the variety of options for the vector of parameters after crossing-over, an
non-uniform mutation is used.
The mutation step is defined as
t
b
( Max j hij )r 1 , r 0.5
T
b
,
t
(hij Min j )r 1 T , r 0.5
where Max j , Min j are the maximum and minimum values of the j -th gene; t is the
iteration number; T is the maximum number of iterations; r is the random number,
r [0,1] ; b is the parameter controlling the speed of step decrease, b 0 .
To simulate annealing, the probability of mutation is defined as
Pm P0 exp(1/ g (t )) , g (t ) g (t 1) , 0 1 , g (0) T0 , T0 0 ,
where P0 is the initial probability of mutation.
Thus, in the early stages of the genetic algorithm, a large step mutation occurs with
high probability, which provides an investigation of the entire search space, and in the
final stages, the probability of mutation and its step tend to zero, which makes the
search directed.
4.6 Choice of reduction operator
The reduction operator allows to create a new population based on the previous popu-
lation and parameter vectors obtained by crossing-over and mutation. As a reduction
operator, a scheme ( ) is applied that does not require scaling and can be used to
minimize fitness function.
4.7 Definition of a stop condition
The following condition is proposed in the work
1 max F (hi ) t T .
i
The values of and T are calculated experimentally.
5 Numerical research
Table 1 presents the probabilities of a person identification by voice obtained on the
basis of TIMIT based on the artificial neural network of the multilayer perceptron
type and the proposed method. At the same time, the artificial neural network has had
two hidden layers (each has consisted of six neurons, like the input layer).
According to Table 1, the proposed method gives the best results.
Table 1. The probability of biometric identification of a person by voice.
Method Identification probability
Artificial neural network 0.8
Proposed method 0.98
6 Conclusions
1. To solve the problem of increasing the efficiency of digital content pro-
cessing system for biometric identification of a person by voice, the corresponding
speaker recognition methods have been investigated. These studies have shown that
today the use of artificial neural networks in combination with the fuzzy inference
system and the genetic algorithm is the most effective method.
2. The proposed method of digital content processing for biometric identifica-
tion of a person by voice automates the process of generation of digital content fea-
tures, provides a representation of knowledge in the form of rules that are easily ac-
cessible for human understanding, and simplifies the determination of the structure of
the model due to the fuzzy inference system; reduces the probability of falling into a
local extremum and provides an acceptable speed for determining the parameter val-
ues of the model by choosing the effective structure of the genetic algorithm; allows
parallel processing of information due to the artificial neural network.
3. As a result of a numerical study, it has been found that the proposed method
of digital content processing provides 0,98 probability of biometric identification of a
person by voice, which exceeds the probability obtained by the artificial neural net-
work such as a multilayer perceptron.
4. The proposed method of digital content processing for biometric identifica-
tion of a person by voice can be used in various intelligent systems for digital content
processing.
References
1. Bolle, R.M., Connell, J., Pankanti, S., Ratha, N.K., Senior, A.W.: Guide to biometrics.
Springer, New York (2004).
2. Jain, A.K., Flynn, P., Ross, A. (Eds.): Handbook of biometrics. Springer, New York, NY
(2008).
3. Dunstone, T., Yager, N.: Biometric system and data analysis: design, evaluation, and data
mining. Springer, New York (2009).
4. Singh, N., Khan, R., Shree, R.: Applications of Speaker Recognition. Procedia Engineer-
ing. 38, 3122–3126 (2012). doi: 10.1016/j.proeng.2012.06.363
5. Li, Q.: Speaker authentication. Springer-Verlag Berlin Heidelberg, Heidelberg (2012).
6. Keshet, J., Bengio, S.: Automatic speech and speaker recognition: large margin and kernel
methods. John Wiley & Sons, Chichester (2009).
7. Herbig, T., Gerl, F., Minker, W.: Self-learning speaker identification: a system for en-
hanced speech recognition. Springer, Berlin (2013).
8. Campbell, J.: Speaker recognition: a tutorial. Proceedings of the IEEE. 85, 1437–1462
(1997). doi: 10.1109/5.628714
9. Togneri, R., Pullella, D.: An Overview of Speaker Identification: Accuracy and Robust-
ness Issues. IEEE Circuits and Systems Magazine. 11, 23–61 (2011).
doi: 10.1109/MCAS.2011.941079
10. Beigi, H.: Fundamentals of speaker recognition. Springer, New York (2011).
11. Reynolds, D.A.: An overview of automatic speaker recognition technology. IEEE Interna-
tional Conference on Acoustics Speech and Signal Processing. 4, 4072–4075 (2002).
doi: 10.1109/ICASSP.2002.5745552
12. Kinnunen, T., Li, H.: An overview of text-independent speaker recognition: From features
to supervectors. Speech Communication. 52, 12–40 (2010).
doi: 10.1016/j.specom.2009.08.009
13. Reynolds, D., Rose, R.: Robust text-independent speaker identification using Gaussian
mixture speaker models. IEEE Transactions on Speech and Audio Processing. 3, 72–83
(1995). doi: 10.1109/89.365379
14. Zeng, F.-Z., Zhou, H.: Speaker Recognition based on a Novel Hybrid Algorithm. Procedia
Engineering. 61, 220–226 (2013). doi: 10.1016/j.proeng.2013.08.007
15. Jeyalakshmi, C., Krishnamurthi., V., Revathi, A.: Speech recognition of deaf and hard of
hearing people using hybrid neural network. 2010 2nd International Conference on Me-
chanical and Electronics Engineering. (2010). doi: 10.1109/ICMEE.2010.5558589
16. Nayana, P., Mathew, D., Thomas, A.: Comparison of Text Independent Speaker Identifica-
tion Systems using GMM and i-Vector Methods. Procedia Computer Science. 115, 47–54
(2017). doi: 10.1016/j.procs.2017.09.075
17. Chauhan, V., Dwivedi, Sh., Karale, P., Potdar, S.M.: Speech to text converter using Gauss-
ian mixture model (GMM). International Research Journal of Engineering and Technology
(IRJET). 3, 160–164 (2016).
18. Reynolds, D.A.: Automatic speaker recognition using Gaussian mixture speaker models.
IEEE Transactions on Speech and Audio Processing. 3, 1738–1752 (1995).
19. Fedorov, E., Lukashenko, V., Utkina, T., Rudakov, K., Lukashenko, A.: Method for para-
metric identification of Gaussian mixture model based on clonal selection algorithm.
CEUR Workshop Proceedings. 2353, 41–55 (2019).
20. Larin, V.J., Fedorov, E.E.: Combination of PNN network and DTW method for identifica-
tion of reserved words, used in aviation during radio negotiation. Radioelectronics and
Communications Systems. 57, 362–368 (2014). doi: 10.3103/S0735272714080044
21. Rabiner, L.R., Juang, B.-H.: Fundamentals of speech recognition. Pearson Education, Del-
hi (2005).
22. Markel, J.D., Gray, A.H.: Linear prediction of speech. Springer-Verlag, Berlin (1976).