=Paper=
{{Paper
|id=Vol-2268/paper13
|storemode=property
|title=Comparative Analysis of Classification Methods for Human Identification by gait
|pdfUrl=https://ceur-ws.org/Vol-2268/paper13.pdf
|volume=Vol-2268
|authors=Lubov Shiripova,Evgeny Myasnikov
|dblpUrl=https://dblp.org/rec/conf/aist/ShiripovaM18
}}
==Comparative Analysis of Classification Methods for Human Identification by gait==
Comparative Analysis of Classification Methods for Human Identification by Gait Lubov Shiripova1 and Evgeny Myasnikov1 Samara University, Moskovskoe Shosse 34, Samara, Russia, 443086 mevg@geosamara.ru http://ssau.ru Abstract. The paper considers the problem of a person identification by gait using a video sequence. It provides a comparative analysis of two methods. The first method is proposed in this paper and consists in the detection of a moving person on a video sequence with the sub- sequent size normalization, generation of subsequences, and dimension- ality reduction using the principal component analysis technique. The person classification is carried out using the support vector machine. In the second method, the known approach based on the hidden Markov model is used. CASIA GAIT dataset is used in this paper to compare the above methods. It was shown that the proposed method outperforms the HMM-based technique and provides high classification accuracy on the considered dataset. Keywords: gait analysis, person identification, dimensionality reduc- tion, SVM, HMM 1 Introduction The identification of a person by its biometric parameters is popular and widely used all over the world at present. Such specific features as a face image, voice timbre, fingerprint, iris pattern and even gait are used for identification of a person. Although the use of fingerprints or iris patterns makes it possible to identify a person with little or no error, contactless and remote identification methods are of considerable interest. In this regard, especially important is the problem of recognizing a person using his gait. Considering a gait as a set of poses and movements, we can distinguish two most common ways of recording (capturing) such information: video [1] (for example, in the optical range) and recording using sensors located on a human body [2]. In addition, there are works, in which gait analysis is performed based on the readings of accelerometers built into a smartphone [3]. Considering that a gait allows to identify a person even in cases where it is not possible to produce it in other ways (the person is at a distance, it is impossible to obtain the high-quality image of the face, etc.), the use of video, for example, from CCTV cameras is of particular interest. To date, various methods have been used to solve the problem of identification of a person on video by gait. The approach used in [4] consists in the subsequent segmentation of video frames using the background subtraction algorithm based on a mixture of Gaus- sian distributions (GMM), dimensionality reduction using the principal com- ponent analysis technique (PCA), and classification based on the Fisher linear discriminant (FLDA). Another feature of the work is the combination of move- ment features with the features of person’s trace (footprints). The first step of the approach proposed in the paper [5] is an improved background subtraction procedure. In this paper, selected motions are described by the descriptors based on the form statistical analysis (Procrustes analysis) technique. The procedure of the supervised classification is constructed using the appropriate measure (Procrustes distance measure). In the paper [1], the analysis of the efficiency of linear (PCA) and non-linear (ISOMAP, LLE) dimensionality reduction techniques is performed. A Hidden Markov Model (HMM) is used to classify persons using features generated with the above techniques. In the paper [6], Support Vector Machine (SVM) is used to solve the prob- lem of classification of a person by gait. In particular, the dependence of the classification accuracy on the choice of the type of the SVM kernel is studied in the paper. In general, it is worth noting that the problem of recognizing a person by gait attracts the attention of an increasing number of researchers. Considerable attention is paid to both feature description techniques and to the choice of effective classification methods. In this paper, to solve the problem of identification of a person by gait, we follow the general approach used in the above studies [1, 4, 6]. This method is based on the detection and segmentation of a moving person on a video se- quence, normalizing the size of frames, generating subsequences and reducing the dimensionality of a subsequences using the principal component analysis technique. The support vector machine is used as a classifier. The proposed approach differs from the above mentioned papers. The proposed technique is compared to the HMM-based technique. The pa- per shows that the proposed method provides higher classification accuracy with a relatively small number of classes. The paper has the following structure. Section 2 is devoted to the description of the method used in the paper. Section 3 describes the results of experiments. The paper ends up with the conclusion. The list of references is given at the end of the paper. 2 Methods 2.1 Identification of a moving person using principal components analysis and support vector machine The method developed in this paper consists of the following steps (see Figure 1): - detection and segmentation of a moving person in the video sequence, - normalization of the size of detected fragments, - generation of subsequences of video frames and dimensionality reduction of generated subsequences, - classification of video sequences. Fig. 1. Main steps of the proposed method Detection and segmentation of a moving person on a video sequence. At the first stage of the developed method, a moving person is detected on a video sequence. Background subtraction methods are most frequently used to detect moving objects if the video sequence was obtained using video surveillance camera. The main idea of the methods of this class is to use a certain background model and to decide whether a particular pixel belongs to the background or a moving object. This decision is based on the correspondence of the pixel to the background model. The background model is gradually refined. Although the time-averaged image can be used as a background model in simplest applications, better results in this problem are given by more complex models, for example [7–9]. In this paper, we use the background subtraction algorithm based on a mix- ture of Gaussian distributions (Gaussian mixture model, GMM) [8]. According to this method, each background pixel is modeled by a weighted sum (mixture) of Gaussians. The weights of Gaussians are determined by time periods, during which the corresponding color is present on the video sequence. To choose this particular background subtraction technique, we took into ac- count both our preliminary experiments and the experience of using this method by other researchers [4, 5]. Upon completion of the first stage of the method, a sequence of masks cor- responding to individual frames of the video sequence is formed. Each mask reflects the result of the segmentation of the corresponding video frame into the foreground region (moving person) and the background. Normalization of the size of detected fragments. At the second stage of the method, obtained masks are processed as follows. First, the center of mass for each foreground region is calculated. Then the linear dimensions (size) of the region are determined, and a framing (truncation of the mask image) is performed. After that, the cropped image is resized to the specified size. The described scheme is shown in Figure 2. Fig. 2. Normalization of the size of the selected video sequence fragment Taking into account the time coordinate, the dimensionality of the sequence of masks, which describes the movement of a person, remains high even after size normalization. In this regard, the third stage reduces the dimensionality of data describing the movement of a person. Dimensionality reduction using the principal component analysis tech- nique. To reduce the dimensionality of multidimensional data, both linear and nonlinear methods are used. The most commonly used are linear methods, such as the principal component analysis (PCA) [11] and independent component analysis (ICA). Nonlinear dimensionality reduction methods [12] (for example, nonlinear mapping, ISOMAP, LLE) are used less often due to the high compu- tational complexity of such methods. It should be noted that recent attempts have been made to accelerate such methods [13, 14]. In this paper, we use the principal component analysis technique, as the most often used in such cases (see, for example, [1, 4]). This method searches for a linear projection into the subspace of a smaller dimension that maximizes the variance of data. The PCA method is often considered as a linear dimensionality reduction technique, minimizing the loss of information. In this paper, before reducing the dimensionality of data, we form a set of subsequences of a fixed length for each sequence of frames. To do this, we suc- cessively select subsequences of the predefined length k with the step s starting from the beginning of the whole sequence (see Figure 3). Fig. 3. Parameters of the allocated subsequences For each selected subsequence, the vector of features is formed as follows: each normalized frame of the subsequence is expanded into a row, and the rows obtained for individual frames are concatenated to each other. The feature vectors of all sequences for different persons form the input ma- trix for the principal component analysis technique. When principal components are found, the projection of feature vectors onto the first N principal components is taken as a feature description. Classification of video sequences. The features obtained as a result of the principal component analysis are used to train the support vector machine (SVM) [15] classifier. In the considered case, the classes correspond to individ- ual persons (individuals), and feature vectors obtained for all the subsequences correspond to individual observations (examples). The description given above is valid for the training mode, in which the pa- rameters of the dimensionality reduction procedure (PCA) and classifier (SVM) are configured. In the testing mode, the data is processed in the same way, ex- cept that the parameters of the linear transformation (which is used to reduce the dimensionality) are fixed to the values obtained in the training mode, and the classification is performed by the trained SVM classifier. 2.2 Identification of a moving person using hidden Markov models In this paper, we consider the method described in [16] as an alternative ap- proach. In this section, we give a brief description of this approach. A Hidden Markov Model (HMM) is defined as λ = (π,A,B), where π is the initial state distribution, A is the state transition probability matrix, and B is the observation probability matrix. As the gait of a person consists of a sequence of cyclic movements, the HMM can be naturally adapted to solve the considered task. To built the HMM model one has to determine the number n of states Si , i = 1..n of the model, and estimate initial π, transition A, and observation B probability matrices. In according to [16], we define the set of states Si in such a way that the system successively passes through all these states at equal time intervals in one cycle of a motion. For every state Si , i = 1..n, we estimate corresponding exemplar image (person’s stance) Ei , i = 1..n using the sequence of silhouettes (masks) extracted from a video. These exemplar images are used thereafter to estimate observation probabilities. HMM initialization. The HMM is built for each person separately. The initial state probabilities are set to πi = 1/n, i = 1..n, as the first frame of an observed sequence can correspond to any state si . The transition probability matrix A = {ai,j } is defined as: ai,i = 0.5 and ai,i+1 = 0.5, 1 ≤ i < n, an,1 = 0.5. The observation probability matrix B = {bi,j }, i = 1..n, j = 1..m for a given set of frames X = xj , j = 1..m is defined as bi,j = αe−βD(ei ,xj ) . (1) Here α and β are some constants, D() is a distance metric. In this paper we used Euclidean Pm distance as a metric. Above mentioned constants are selected so that b j=1 i,j = 1, i = 1..n. The described HMM is shown in the Figure 4. Fig. 4. The hidden Markov model for person identification. As it can be seen from the above expression, we first need to define exem- plar images ei , i = 1..n to estimate the observation probabilities. To do this, we use the following procedure proposed in [16]. At the first step, for each frame xj , j = 1..m of a sequence, we compute the mass gj , j = 1..m of the correspond- ing silhouette (mask) image. Then mass values gj are filtered with a median filter, and local minima are estimated. These minima allow us to partition the whole sequence xj , j = 1..m into some number of gait cycles. After boundaries of gait cycles are determined, we divide each cycle into n intervals of equal (up to one frame) length, correspondingly to HMM states si , i = 1..n. Finally, ex- emplar images ei , i = 1..n are estimated by averaging silhouette (mask) images of corresponding intervals for all gait cycles. Examples of estimated exemplar images are shown in Figure 5. Fig. 5. Exemplar images estimated for two different persons. HMM training. The initial HMM model described above can be further refined according to [16] using the following algorithm: 1. The most probable sequence of states is estimated for each training se- quence of frames using the Viterbi algorithm [17]. 2. The exemplar images are refined according to the estimation made at the previous step of the algorithm. 3. The transition probability matrix A is refined using the Baum-Welch al- gorithm [17] (observation probabilities B are recalculated using re-estimated exemplar images). Classification of video sequences. Given a number of trained HMM models λk , k = 1..K, and a test frame sequence, X = {xj }, j = 1..m, we select the model, which gives maximum probability p(X|λk ) of producing the test sequence. The probability for each trained model is computed using the forward algorithm [17]. 3 Experiments The described above methods were implemented in C ++ using the OpenCV library. A PC based on Intel Core i5-3470 CPU 3.2 GHz was used to perform experimental studies. For the experimental study, the video sequences from the open CASIA GAIT dataset [18] were used. This dataset contains sequences of binary images, which contain silhouettes of moving persons. In this work, we used sequences of 25 persons, in which the shooting angle is 90 degrees, people are depicted in normal clothes and without bags. There were 6 sequences in each class. The length of each sequence was not less than 60 frames. Classes were divided into training and test samples of 3 sequences each. To estimate the quality of the consid- ered methods, we used the classification accuracy, defined as the proportion of correctly classified sequences. For the method proposed in Section 2.1, we studied the dependence of the classification accuracy on the dimensionality of feature vectors (output dimen- sionality of the PCA technique). The results of the experiments are shown in the Figure 6. As it can be seen from the figure, the best values of the classification accuracy are achieved for 64-dimensional feature vectors. The increase in dimen- sionality is accompanied by the expected increase in processing time, although the changes are not very significant. Fig. 6. Dependence of the classification accuracy (top) and processing time (bottom) on the dimensionality of feature vectors. In the second experiment, we considered the dependence of the classification accuracy on the number of classes (persons). The experiment was carried out for 5, 10, 15, 20 and 25 classes, and other parameters remained fixed. In particular, the step s was equal to 2 frames, the maximum shift m of the beginning of the extracted subsequences was equal to 15 frames, the dimensionality of feature vectors was equal to 64. The results of the experiment are shown in the Figure 7. This experiment was conducted on the first three sequences of random classes with subsequent averaging. Fig. 7. Dependence of the classification accuracy (top) and processing time (bottom) on the number of classes for the proposed method. It is worth noting that a direct experimental comparison to other works seems to be quite a challenge in connection with the different data sets used, as well as the potential differences in the experimental conditions. The closest approach to the proposed one is described in the paper [6]. Depending on the classifier configuration, the authors in [6] declared the accuracy from 92.08 to 98.79% for the case with ten objects. In this paper, for 25 classes we achieved 98.80% classification accuracy value, which was validated over twenty possible combinations of three training sequences. Thus, it can be said that the results obtained in this paper correspond to the current state in the considered field of research. As it can be seen in Figure 7, the processing takes an increasing amount of time as the number of classes increases. Considerable time is taken in the training mode. This fact becomes especially important in scenarios when the number of classes changes dynamically, and it is required to re-train the system regularly. To compare the proposed method to the known approach, we implemented the HMM-based technique described in Section 2.2. The results of the exper- iments are shown in the Figure 8. As you can see, the HMM-based approach provided acceptable accuracy for the considered number of classes. But these values are significantly lower then the accuracy demonstrated by the proposed approach. Fig. 8. Dependence of the classification accuracy on the number of classes for the HMM-based method. 4 Conclusion In this paper, we proposed the method for human identification by gait. The proposed method consists of the detection of a moving person on a video se- quence with the subsequent normalization of size, generation of subsequences, dimensionality reduction using the principal component analysis technique, and classification using the support vector machine. The experiments performed on the CASIA GAIT dataset allowed to determine the best values of the parame- ters of the proposed method and to compare the proposed method to the HMM- based technique. It was shown that the proposed method outperforms imple- mented HMM-based technique and provides high classification accuracy on the considered dataset. In particular, for 25 classes, the accuracy was 98,8% that corresponds to the current state of research. The drawbacks of the proposed method include its long operating time. In connection with this, a promising line of research is speeding up this method. Acknowledgments The reported study was funded by RFBR according to the research project no. 17-29-03190-ofi. References 1. H. Josinski, A. Switonski, A. Michalczuk, D. Kostrzewa, K. Wojciechowski: Feature Extraction and HMM-Based Classification of Gait Video Sequences for the Purpose of Human Identification. Vision Based Systems for UAV Applications. Studies in Computational Intelligence, Vol. 481, pp. 233-245. (2013) 2. J. Suutala, K. Fujinami, J. Roning: Gaussian Process Person Identifier Based on Simple Floor Sensors. European Conference on Smart Sensing and Context, pp. 55-68. (2008) 3. D. Dingbo, G. Guangyu, H. Chi, JianMa: Automatic Person Identification in Cam- era Video by Motion Correlation. Hindawi Publishing Corporation Journal of Sen- sors, 838751. (2014) 4. C. Murukesh, K. Thanushkodi, P. Padmanabhan, Mohamed D. Feroze, Naina: Se- cured Authentication through Integration of Gait and Footprint for Human Identi- fication. Journal of Electrical Engineering and Technology, Vol. 9(6), pp. 2118-2125. (2014) 5. L Wang, T. Tan, W. Hu, H. Ning: Automatic Gait Recognition Based on Statistical Shape Analysis. IEEE Transactions on image processing, Vol. 12(9), pp. 1120-1131. (2003) 6. P.B.Shelke, P.R.Deshmukh: Person Identification Using Gait: SVM Classifier Ap- proach. International Journal of Emerging Technologies and Engineering (IJETE), Vol. 1(10). (2014) 7. P. KadewTraKuPong, R. Bowden: An improved adaptive background mixture model for real-time tracking with shadow detection. Video-Based Surveillance Sys- tems, pp. 135-144. (2002) 8. Z.Zivkovic: Improved adaptive Gausian mixture model for background subtraction. Proc. of the 17th Int. Conf. on Pattern Recognition Cambridge, Vol. 2, pp. 28-31. (2004) 9. A.B. Godbehere, A. Matsukawa, K. Goldberg: Visual Tracking of Human Visitors under Variable-Lighting Conditions for a Responsive Audio Art Installation. (2012) 10. Background Subtraction. https://docs.opencv.org/3.3.0/db/d5c/ tutorial py bg subtraction.html 11. K. Fukunaga: Introduction to Statistical Pattern Recognition. London: Academic Press, 2nd edn. (2003) 12. J.A. Lee, M. Verleysen: Nonlinear Dimensionality Reduction. New York: Springer- Verlag. (2007) 13. E.V. Myasnikov: A Nonlinear Method for Dimensionality Reduction of Data Using Reference Nodes. Pattern Recognition and Image Analysis, Vol. 22 (2), pp. 337-345. (2012) 14. E.V. Myasnikov: Fast Techniques for Nonlinear Mapping of Hyperspectral Data. Proc. SPIE 10341, Ninth International Conference on Machine Vision (ICMV 2016), 103411D. (2017) 15. C. Cortes, V. Vapnik: Support-vector networks. Machine Learning, Vol. 20 (3), pp. 273-297. (1995) 16. A. Sundaresan, A. RoyChowdhury, R. Chellappa: A hidden Markov model based framework for recognition of humans from gait sequences. Proceedings 2003 Inter- national Conference on Image Processing, Vol. 3, pp. II-93-6. (2003) 17. L.R. Rabiner: A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proceedings of the IEEE, Vol. 77(2), pp. 257-286. (1989) 18. CASIA GAIT dataset. http://www.cbsr.ia.ac.cn/english/Databases.asp.