Gait Classification of Common Pedestrians and Smartphone Zombies Using Micro-Doppler Radar Kazuki Yasuda 1, Teppei Tsuyuhara 1, Masao Masugi 1 and Kenshi Saho 1,2 1 Ritsumeikan University, 1-1-1 Noji-Higashi, Kusatsu, Shiga, Japan 2 Toyama Prefectural University, 5180 Kurokawa, Imizu, Toyama, Japan Abstract In this study, a micro-Doppler radar-based method for classifying gaits with and without smartphone texting while walking (i.e., the classification of normal pedestrians and smartphone zombies) is proposed. The motion features in the gait of the participants texting on a smartphone while walking were collected using the micro-Doppler radar as time-frequency distributions (spectrograms), which showed the velocity variation of body parts with respect to time. We adopted a method of inputting spectrogram images obtained from a short-time Fourier transform of the received signals of the Doppler radar to a convolutional neural network. Experimental results showed that the proposed method achieved accurate classification with over 80 % accuracy. Keywords Micro-Doppler radar, gait measurement, gait recognition, texting while walking 1. Introduction With the increasing use of smartphones, the number of people texting on smartphones while walking (called "smartphone zombie") has increased, which has led to many accidents, such as collisions and falling. Various approaches have been studied to detect the action of people using smartphones while walking to avoid such accidents and to develop warning and collision avoidance systems. For example, a method that uses built-in smartphone sensors has been proposed [1]. The remote detection of smartphone zombies is a promising approach, and methods for remote detection of smartphone zombies using cameras and image processing techniques have been proposed [2, 3]. Another approach using lidar has also been proposed to recognize pedestrian types [4]. However, the detection accuracies of these optical sensor-based techniques depend on the lighting conditions and subjects clothes. In addition, the use of a camera has privacy issues. In contrast, radar techniques can measure motion information without these problems. In particular, the micro-Doppler radar technique is a promising candidate for accurate pedestrian sensing because of its capability to remotely measure the time variation in the velocities of human body parts [5, 6]. The micro-Doppler radar achieves various types of accurate human motion recognition, such as gesture recognition [7], human identification [8], and fall detection [9]. However, micro-Doppler radar applications for detecting the action of texting while walking have not been reported. In this paper, we present a method for classifying the gaits of common pedestrians and smartphone zombies using micro-Doppler radar data. The micro-Doppler radar data of the gaits of participants walking normally and participants on a smartphone while walking were collected and analyzed. Then, an accuracy of more than 80 % was demonstrated in classifications by inputting a convolutional neural network (CNN) to time-frequency distribution (spectrogram) images calculated from radar-received signals. We investigated the classification accuracies using representative architectures of CNNs, such as VGGNet, ResNet, and AlexNet, and revealed an efficient CNN model for micro-Doppler radar-based detection of smartphone zombies. The 4th International Symposium on Advanced Technologies and Applications in the Internet of Things (ATAIT 2022), August 24-26, 2022, Ibaraki, Japan EMAIL: ri0095xv@ed.ritsumei.ac.jp ( ); masugi@fc.ritsumei.ac.jp ( ); saho@pu-toyama.ac.jp ( ) ORCID: 0000-0003-2088-1231 ( ) 202 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org) 2. Methods 2.1. Experimental protocol Figure 1 shows the experimental conditions of the radar. A 24 GHz continuous-wave (CW) radar (BSS-110, ILT Office, Toyama, Japan) was placed in front of the participant. The radar was a monostatic micro-Doppler radar installed at a height of 60 cm, and the -3 dB beamwidths in the vertical and horizontal planes of the antenna were ±14 °and ±35 °, respectively. The participants were instructed to walk towards the radar along an 8-m straight walkway. The radar signals received at a sampling frequency of 600 Hz were obtained after demodulation using a quadrature detector. We used the acquired time series of in-phase and quadrature components as complex signals. The participants were 18 adults, 14 men and 4 women with ages and heights of 21.5 ± 0.8 years and 168.9 ± 11.5 cm, respectively. Seventeen participants performed the following two types of walking motions 15 times each and one participant performed them 3 times each: (a) Common walking (Figure 1 (a)), where participants walked normally at a self-selected comfortable pace, and (b) texting on a smartphone while walking (smartphone zombie) (Figure 1 (b)), where the participants walked following the instruction "please walk while reading texts on your smartphone." The aim of this study is to classify the radar data of walking motion types (a) and (b). (a) (b) Figure 1: Experimental situations. (a) common walking and (b) texting while walking (smartphone zombie). 2.2. Dataset generation We then generated images of the time-frequency distribution (spectrogram images) of the received signals using a short-time Fourier transform (STFT), similar to our previous study [8]. Before the STFT process, we used a Butterworth high-pass filter on the received signals to remove the zero-frequency components corresponding to the static targets. We then calculated the STFT of the received signal as (1) where s(t) is the high-pass filtered received signal, w(t) is a window function, and f is the Doppler frequency corresponding to target velocity v. The relationship between f and v is (2) where denotes the wavelength of the transmitted signal. Because this study used a 24 GHz contentious wave, was 12.5 mm. For w(t), we empirically used the commonly used Hamming window function with a length of 21.7 ms. We calculated the spectrogram as |S(t, f)|2. Consequently, we removed the components whose received powers were smaller than 3 dB/Hz as noise in |S(t, f)|2, trimmed the spectrogram corresponding to steady-state walking, and converted the trimmed spectrograms into RGB- colored PNG images of size 164 × 218. We used the generated (17 (participants) × 15 (times of walking) + 1 (participant) × 3 (times of walking)) ×2 (types of motion) = 516 PNG images as the dataset for this study. Figure 2 shows examples of the generated spectrogram images of walking motion types (a) and (b). The frequencies of the spectrogram correspond to the target velocity, components in the spectrogram images with larger received power correspond to the motion of the torso, and accompanying relatively larger frequency variations correspond to the forward motion of the legs. Although motion type (b) (smartphone zombie) tended to have a slightly lower frequency compared with motion type (a) (common walking), clear differences between the two types could not be confirmed in the spectrogram images. This is because the essential differences between these two types of walking motion were less visible in the spectrograms. The simplest difference between these types was the presence of an arm- swinging motion. However, the arm motions were not clearly confirmed in the spectrograms because the received powers of the echoes from the arms were quite small compared to those from the torso and legs [10]. Additionally, the velocities of the arms were similar to those of the legs. Thus, the components corresponding to the motions of the arms were mixed with background noise and components corresponding to leg motions. Another difference between the two types of walking motion was the position of the head. However, acquisition of this head (or neck) position difference was difficult for the Doppler radar because it measured only velocity information, and position-related information was slightly included in the spectrograms (as very slight differences in the received powers corresponding to the head and neck). Thus, the spectrogram images of common pedestrians and smartphone zombies were not clear. (a) (b) Figure 2: Examples of spectrogram images. (a) common walking and (b) smartphone zombie. 2.3. Gait classification using CNN and accuracy evaluation To extract and use the slight differences between the spectrogram images of the two groups, we used a CNN for feature extraction and gait classification. Figure 3 shows an outline of the proposed classification method. The CNN was used as the deep-learning method for classification, similar to previous studies on Doppler radar-based personal identification [5, 6]. The generated spectrogram images were used as input data for the CNN. We investigated the classification performance of CNNs with representative architectures of LeNet [11], AlexNet [12], ResNet-18 [13], and VGGNet [14] because they are effective for radar-based human motion recognition problems. In addition, ResNet-50 [15] was considered to investigate the effectiveness of the deeper network. The hyperparameters of each network were empirically optimized. The examples of the hyperparameters for AlexNet are as follows: the loss function was a cross-entropy function and the stochastic gradient descent with momentum optimization algorithm was used for the modeling. We trained for 50 epochs and used a batch size of 64. The learning rate was 0.01. To evaluate the classification accuracy, we performed a hold-out validation. In each architecture, the CNN was trained using 80 % of the generated images, and the remaining 20 % were used as test data. Ten hold-out validation trials were conducted by randomly varying the training and test data. The mean and standard deviation of the classification accuracies across all trials were calculated. Figure 3: Outline of the classification method using CNN. Input: spectrogram images. Output: classified walking type. 3. Results and Discussion Table 1 presents the test results for the classification of walking motion types (a) and (b) using various CNN architectures. LeNet, AlexNet, and ResNet-50 achieved accurate classification with over 80 % accuracy. The highest accuracy of 85.7 % was achieved using AlexNet. The accuracy of ResNet- 50 was relatively stable, with a smaller standard deviation. These results indicate that the CNN with spectrogram images accurately classified the gaits of the common pedestrian and smartphone zombie groups, even though their differences in the images were minor. The results for LeNet and AlexNet show that a relatively simple and lower-layered network can sufficiently capture the gait features of smartphone zombies. However, relatively deeper ResNet-50 results were stable, which implies that such a network may be effective when the data of a larger number of participants and training samples are used; this should be an important future direction of this study. The effectiveness of the proposed method using AlexNet, which achieved the highest accuracy, is discussed using the confusion matrix and convergence curve. Table 2 presents the confusion matrix for the results using AlexNet and indicates that there are some cases in which the use of the smartphone did not change the speed of the torso and leg motions; therefore, these data were not correctly classified. However, a sensitivity of 84 % and a specificity of 86 % were achieved. Thus, the proposed method can accurately screen for smartphone zombies. Figure 4 shows an example of the convergence curves of the proposed method using AlexNet and ResNet-50. For both networks, the accuracy converged to approximately 50 epochs during the test process, and overfitting was not confirmed. Thus, the proposed method effectively trained the classification model. Although AlexNet achieved a better accuracy result, the steady state accuracy of ResNet-50 was stable compared to that of AlexNet, and these results indicate a tendency similar to that in Table 1, which indicates the possibility of the effectiveness of a deeper network. Finally, we discuss the mechanism of accurate gait classification in our results. As shown in Section 2.2, the differences between the images of the two walking types were not clear because the motion differences in the arms and head were not sufficiently measured. However, our results indicated that a classification with an accuracy of more than 80 % was achieved using the spectrogram images that mainly reflected the motions of the torso and legs. It appears that the proposed method extracted features corresponding to slight differences in such motions using the CNN. A biomechanical study clarified slight reductions in gait speed and changes in the tibialis anterior and gastrocnemius when texting on a smartphone while walking for young adults [16]. These apparently affect the motion of the legs and torso. Thus, it can be considered that the features corresponding to these differences could be efficiently learned from the spectrogram images using the CNN in the proposed method. Table 1 Classification results. CNN architecture Mean classification accuracy Standard deviation of classification among 10 tests [%] accuracy among 10 tests [%] LeNet 80.11 4.67 AlexNet 85.72 5.64 ResNet-18 79.11 5.88 ResNet-50 80.42 2.98 VGG-16 74.71 4.22 Table 2 Confusion matrix for the classification using the AlexNet. Walking type (a) Walking type (b) Predicted\True (Common walking) (Texting while walking) Walking type (a) 85.7 % 15.8 % Walking type (b) 14.3 % 84.2 % (a) (b) Figure 4: Examples of convergence curves of the proposed method. (a) AlexNet, (b) ResNet-50. 4. Conclusion In this study, the gaits of common pedestrians and smartphone zombies were measured using the micro-Doppler radar, and images of spectrograms (time-frequency distributions) of the radar received signals were used for classification. By applying a CNN using the image spectrograms as the input data, a classification accuracy of 86% was achieved using the experimental data. Thus, the effectiveness of the proposed method for remotely detecting the action of texting on a smartphone while walking was verified. In our experiments, all participants were approximately 20 years old and were measured in the same experimental environment. Therefore, the future direction of our study is to measure various types of participants of different ages and in different environments. In addition, experiments on multiple pedestrian situations are important to demonstrate the practicality of the proposed method. Acknowledgements We appreciate all the participants for their engagement in this study. Reference [1] T. Wada and A. Shikishima, "Real-time detection system for smartphone zombie based on machine learning." IEICE Commun. Express, 9.7(2020): 268-273. [2] H. Hanaizumi, H. Misono, An OpenPose based method to detect texting while walking, in: Proc. ICISIP2019, Howard International House, Taipei, Taiwan, 2019, pp. 130-134. [3] A. Rangesh and M.M Trivedi. When vehicles see pedestrians with phones: A multicue framework for recognizing phone-based activities of pedestrians. IEEE Transactions on Intelligent Vehicles 3.2(2018): 218-227. [4] J. Wu et al. Smartphone zombie detection from lidar point cloud for mobile robot safety. IEEE Robot. Automat. Lett. 5.2(2020): 2256-2263. [5] H. Arab, I. Ghaffari, L. Chioukh, S.O. Tatu, and S. Dufour. A Convolutional neural network for human motion recognition and classification using a millimeter-wave Doppler radar. IEEE Sens. J. 22.5 (2022): 449404502. [6] S. Z. Gurbuz and M. G. Amin. -based human-motion recognition with deep learning: Promising applications for indoor monitoring. (2019): 16-28. [7] Z. Yu, D. Zhang, Z. Wang, Q. Han, B. Guo, and Q. Wang. based on SIMO Doppler radar. (2022): 276-289. [8] K. Saho, K. Shioiri, and K. Inuzuka. -to- stand and stand-to-sit movements measured using Doppler radars. J. 21.4(2020): 4563-4570. [9] H. Sadreazami, M. Bolic, and S. Rajan. Contactless fall detection using time-frequency analysis and convolutional neural networks. IEEE Trans. Indust. Inform. 17.10(2021): 6842-6851. [10] based learning applied to document . of the IEEE 86.11(1998): 2278-2324. [11] K. Saho, T. Sakamoto, T. Sato, K. Inoue, and T. Fukuda, Experimental study of real-time human imaging using UWB Doppler radar interferometry, In: 2012 6th European Conference on Antennas and Propagation (EUCAP), 2012, pp. 3495-3499. [12] A. Krizhevsky, I. Sutskever, and G. E. Hinton, ImageNet classification with deep convolutional neural n in: Proceeding of the 25th International Conference on Neural Information Processing Systems, 2012, pp.1097-1105. [13] K. He, X. Zhang, S. Ren, and J. Sun, Deep residual learning for image recognition, in: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2016, pp.770-778. [14] S. Karen and A. - arXiv preprint arXiv : 1409.1556 (2014) [15] R. U. Khan, X. Zhang, R. Kumar, and E. O. Aboagye, Evaluating the performance of ResNet model based on image recognition, in: Proc. Int. Conf. Comput. Artif. Intell. (ICCAI), 2018, pp. 86-90. [16] V. Agostini, F. Lo Fermo, G. Massazza, and M. Knaflitz. Does texting while walking really affect gait in young adults? J. Neuroeng. Rehabil. 12.1(2015): 1-10.