A Convolutional Neural Network Approach to Classification of Human's Behaviors in a Restroom Using Doppler Radars Mutsuki Tsuyama1, Sora Hayashi1, Kenshi Saho2,1, Masao Masugi1 1 Ritsumeikan University, Shiga, Japan 2 Toyama Prefectural University, Toyama, Japan ri0082ip@ed.ritsumei.ac.jp, sora8840817@gmail.com, saho@pu-toyama.ac.jp, masugi@fc.ritsumei.ac.jp Abstract To develop a monitoring system for the early detection of incidents, such as falling of a person in a restroom, this study introduces a classification method for the behaviors in restrooms using Doppler radars and a convolutional neural network. The proposed system used data from two Doppler radars installed at the ceiling and wall of the restroom for the classification of the behaviors, such as putting off/on pants, sitting down, standing up, and falling. As a result, an accurate classification of the assumed behaviors with over 90% accuracy was achieved. 1 Introduction Population aging has made early detection of falls and syncope in elderly adults significant (World Population Prospects, 2019). In particular, monitoring places with privacy concerns, such as restrooms and bathrooms, is required. For this purpose, camera- (Gao et al., 2011 IEEE International Symposium of Circuits and Systems, 2011) or depth sensor-based approaches (Meng et al., 2016 International Conference on Advanced Mechatronic Systems, 2016) have been studied. However, their measurement accuracy is based on the lighting conditions and subject clothes. In addition, installing cameras in restrooms is difficult in most cases because of privacy issues. The radar technique is a promising candidate to resolve these problems because it can detect the subject's behaviors while avoiding privacy concerns and the problems associate to low-light and clothes conditions (Ding et al., IEEE J. Emerging and Selected Topics in Circuits and Systems, 2018). Several studies have achieved accurate motion classification using radar images and deep learning (Zhang and Cao, IEE Sensors Letters, 2019). For the monitoring of restrooms, a detection technique for abnormal behaviors based on range information acquired using frequency-modulated radars have been proposed (Takabatake et al., 2019 IEEE Global Communications Conference, 2019). However, this study did not 38 achieve sufficient accuracy for practical use (approximately 60% accuracy for the classification of seven classes, such as falling, sitting down, and standing). Furthermore, classes with relatively larger differences in movements were considered, and the detailed classification of behaviors in restrooms was not considered. In this study, a radar-based classification of the human's behaviors in a restroom, including falling and other various behaviors, such as putting off pants, taking a toilet paper, and opening the toilet lid is proposed. This system uses two 24 GHz continuous wave Doppler radars placed at the ceiling and a wall behind the subject. The experimental results show an accurate classification of the behaviors sing a convolutional neural network (CNN) and combined data of the two radars. The contributions of this study are as follows: • We demonstrated the practicality of the radar technology, which does not have privacy issues, for the classification of humans' behaviors in the restroom. • The effectiveness of the two radars installed on the wall and ceiling of a restroom was verified for the classification of basic behaviors and falls. • The accuracy of 100 % for the classification of falling and other behaviors was achieved. • The presented CNN-based method based on the fusion of two radar images achieved 95.6 % classification accuracy. 2 Experimental Setup The setup and site of the proposed radar system used to detect the participant's behaviors in a restroom are shown in Figures 1 and 2. We used 24 GHz continuous wave radars (ILT office, BSS-110) with ±14° directivity for the plane shown in Figure 1. That is, the radars can measure behaviors within ±14° forward range. The radars were installed above (ceiling radar) and behind (wall radar) the subject. The radar signals were received by demodulation using a quadrature detector and an analog-to-digital converter with a sampling frequency of 600 Hz. The study participants who consented to this study were 24 young men (age: 22.4 ± 1.1 years, height: 173.8 ± 5.1 cm). Each participant performed the following eight types of behaviors three times each: (a) opening the toilet lid, (b) putting off the pants, (c) sitting down, (d) taking the toilet paper, (e) standing up, (f) putting on the pants, (g) closing the toilet lid, and (h) falling, where the falling is the motion of falling forward from a seated position. Radar Figure 1: Radar system setup Figure 2: Experimental site 39 3 Classification Method Figure 3 presents an outline of the classification method. First, the short-time Fourier transform (STFT) of the received signals generates radar spectrogram images. For the STFT, a Hamming window function with a length of 128 samples was used. The radar spectrogram showed the time-velocity distribution of the measured movements. Figure 4 shows an example of the spectrogram images of behavior (a) calculated from the received signals of the ceiling radar, where the positive velocity indicates the velocity in the direction approaching the radar and the negative velocity indicates the velocity away from the radar. Figure 3: Outline of the classification method Figure 4: An example of the spectrogram for behavior (a) Then, the spectrograms were converted to PNG images of size 164 × 218 with RGB color channels, and the generated images were input to the CNN for the classification of eight behaviors. Tables 1 and 2 show the structures of CNNs with one and two input images, respectively. The CNN using one input image assumes the use of the ceiling or wall radar image, while that using two input images assumes the combination of the images of the two radars. Thus, the classification accuracy for three input cases was investigated and compared: only ceiling radar, only wall radar, and the fusion of two radars. Both CNNs are composed of convolution, max pooling, batch normalization (Ioffe and Szegedy, the 32nd International Conference on Machine Learning, 2015), and fully connected layers. The CNN using two input images combines the outputs of the fully connected layer in a concatenate layer. Stochastic gradient descent with the momentum optimization algorithm was used. The loss function is a cross- entropy function: A total of 100 epochs were trained and a batch size of four was used. The learning rate was 0.01, and was decayed by multiplying by 0.5, every 10 epochs. These hyperparameters were optimized empirically. 40 Layer Filter Size Stride Output Size Input (164,218,3) Convolution 1 (11,11) (3,3) (55,73,48) Maxpooling 1 (3,3) (2,2) (27,36,96) Batch 1 (27,36,96) Convolution 2 (5,5) (3,3) (9,12,128) Maxpooling 2 (3,3) (2,2) (4,5,128) Batch 2 (4,5,128) Convolution 3 (3,3) (1,1) (4,5,192) Convolution 4 (3,3) (1,1) (4,5,192) Convolution 5 (3,3) (1,1) (4,5,128) Maxpooling 3 (3,3) (2,2) (1,2,128) Batch 3 (1,2,128) Fully Connected 1 2048 Fully Connected 2 2048 Fully Connected 3 8 Table 1: The Construction of CNN for one input radar image Layer Filter Size Stride Output Size Input (164,218,3), (164,218,3) Convolution 1 (11,11) (3,3) (55,73,48), (55,73,48) Maxpooling 1 (3,3) (2,2) (27,36,96), (27,36,96) Batch 1 (27,36,96), (27,36,96) Convolution 2 (5,5) (3,3) (9,12,128), (9,12,128) Maxpooling 2 (3,3) (2,2) (4,5,128), (4,5,128) Batch 2 (4,5,128), (4,5,128) Convolution 3 (3,3) (1,1) (4,5,192), (4,5,192) Convolution 4 (3,3) (1,1) (4,5,192), (4,5,192) Convolution 5 (3,3) (1,1) (4,5,128), (4,5,128) Maxpooling 3 (3,3) (2,2) (1,2,128), (1,2,128) Batch 3 (1,2,128), (1,2,128) Fully Connected 1 (2048,2048) Concatenate 4096 Fully Connected 2 8 Table 2: The Construction of CNN for two input radar images 4 Performance Evaluation The classification accuracy of eight behaviors in the restroom were evaluated using hold-out validation for the three input cases explained in the previous section. Each CNN was trained using 80% of the generated spectrogram images, and the remaining 20% of the data were tested. Then, 30 trials of validations were performed by randomly dividing the dataset into training and test data. The mean classification accuracy of the trials was calculated. Table 3 lists the mean classification accuracies of the three input cases. All the results achieved accurate classification with an accuracy of over 90%. The CNN using the combination of the two radars achieved 95.6% accuracy and was significantly better than the CNN using only a single radar data. Finally, the validity and details of our results were discussed using the convergence curve and confusion matrices. First, Figures 5, 6, and 7 show the convergence curves for all input cases. In all 41 cases, we see that there were no overfittings, and the accuracies converged in less than 40 epochs. The confusion matrices of all the cases are shown in Tables 4, 5, and 6, where the values in the tables are rounded off to the two decimal places. As a result, of the ceiling radar, the classification accuracy of "(f) putting on the pants" was worse. In the wall radar, the accuracy of "(b) putting off the pants" was worse. Table 6 indicates that the worse classification accuracy of (b) is not resolved even though the ceiling radar is combined, while the worse accuracy of (f) in the ceiling radar is resolved by the combination of the radar images. The solution to this problem, which will improve classification accuracy, is an important direction for future studies. However, for all results, the classification accuracy of "(h) falling" was 100%. The most important function in practical use is fall detection, which has already been achieved in this study. Combination of two Case Only ceiling radar Only wall radar radars Classification 90.3% 91.5% 95.6% accuracy Table 3: Average classification rate for all cases Figure 5: Learning curve of ceiling radar Figure 6: Learning curve of wall radar Figure 7: Learning curve of both radar 42 Predict Label Predict Label (a) (b) (c) (d) (e) (f) (g) (h) (a) (b) (c) (d) (e) (f) (g) (h) (a) 0.90 0 0 0 0 0 0.09 0 (a) 0.93 0 0 0 0 0 0.07 0 (b) 0 0.85 0 0.15 0 0 0 0 (b) 0 0.77 0 0 0 0 0.23 0 True Label (c) 0. 0 1 0 0 0 0 0 True Label (c) 0.08 0 0.92 0 0 0 0 0 (d) 0 0 0 1 0 0 0 0 (d) 0 0 0 0.9 0.1 0 0 0 (e) 0 0 0 0 0.92 0.08 0 0 (e) 0 0 0 0 1 0 0 0 (f) 0 0 0 0.22 0.06 0.72 0 0 (f) 0 0 0 0 0 0.92 0.07 0 (g) 0.08 0 0 0 0 0 0.92 0 (g) 0 0 0 0 0 0 1 0 (h) 0 0 0 0 0 0 0 1 (h) 0 0 0 0 0 0 0 1 Table 4: Confusion matrix of ceiling radar Table 5: Confusion matrix of wall radar Predict Label (a) (b) (c) (d) (e) (f) (g) (h) (a) 1 0 0 0 0 0 0 0 (b) 0 0.79 0 0 0 0 0.21 0 (c) True Label 0 0 1 0 0 0 0 0 (d) 0 0 0 1 0 0 0 0 (e) 0 0 0 0 1 0 0 0 (f) 0.07 0 0.07 0 0 0.87 0 0 (g) 0.07 0 0 0 0 0 0.92 0 (h) 0 0 0 0 0 0 0 1 Table 6: Confusion matrix of both radar 5 Conclusion In this study, a classification method for the participant’s behaviors in the restroom using two Doppler radars is proposed. In this method, the Doppler radar spectrograms acquired with the radars installed above and behind the participants were input to the CNN. The experimental results showed that an accurate classification of 95.6% was achieved for eight types of realistic behaviors in the restroom, and the classification accuracy of falling and other behaviors was 100%. Future studies will improve the feasibility of the proposed radar system by incorporating participants of different ages and conducting additional experiments in various types of restrooms. 43 References World Population Prospects, The 2019 Revision The Key Findings , https://esa.un.org/unpd/wpp/Publications/Files/WPP2019_10KeyFindings.pdf., 2019. H. Gao, W. Lin, X. Yang, H. Li, N. Xu, J. Xie and Y. Li, "A new Network-based algorithm for Multi-camera abnormal activity detection," in 2011 IEEE International Symposium of Circuits and Systems, Rio de Janeiro, Brazil, 2011. L. Meng, X. Kong and D. Taniguti, "Danger situations detection for the senior in toilet room using the center of gravity," in 2016 International Conference on Advanced Mechatronic Systems, Melbourne, VIC, Australia, 2016. C. Ding, L. Zhang, C. Gu, L. Bai, Z. Liao, H. Hong, Y. Li and X. Zhu, "Non-Contact Human Motion Recognition Based on UWB Radar," IEEE Journal on Emerging and Selected Topics in Circuits and Systems, 2018, pp. 306-315. R. Zhang and S. Cao, "Real-Time Human Motion Behavior Detection via CNN Using mmWave Radar," IEEE Sensors Letters, vol. 3, 2019. W. Takabatake, K. Yamamoto, K. Toyoda, T. Ohtsuki, Y. Shibata and A. Nagate, "FMCW Radar- Based Anomaly Detection in Toilet by Supervised Machine Learning Classifier," in 2019 IEEE Global Communications Conference, Waikoloa, HI, USA, 2019. S. Ioffe and C. Szegedy, "Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift," Proceedings of the 32nd International Conference on Machine Learning, 2015, pp. 448-456. 44