=Paper=
{{Paper
|id=Vol-2846/paper29
|storemode=property
|title=Towards Kinematically Constrained Real Time Human Pose Estimation using Sparse IMUs
|pdfUrl=https://ceur-ws.org/Vol-2846/paper29.pdf
|volume=Vol-2846
|authors=Deepak Nagaraj,Rhett Dobinson,Dirk Werth
|dblpUrl=https://dblp.org/rec/conf/aaaiss/NagarajDW21
}}
==Towards Kinematically Constrained Real Time Human Pose Estimation using Sparse IMUs==
Towards Kinematically Constrained Real Time Human Pose Estimation using Sparse IMUs Deepak Nagaraja, Rhett Dobinsona and Dirk Wertha a AWS-Institute for Digitized Products and Processes, Uni-Campus Nord, Saarbruecken, Saarland, Germany Abstract Real time human posture estimation using reduced number of sensors is a challenging and highly sought after problem. Various model-based methods have been developed over the years in this direction which utilize optical and/or inertial sensor data. Although these methods have proven effective in laboratory settings, their applicability in the real world is limited due to the difficulty in information gathering, high intrusiveness and higher cost. This non-position paper deals with a hybrid approach involving full-body inverse kinematics (IK) and deep learning in order to estimate physiologically feasible joint angles in real time, based on orientation information from 6 inertial measurement units (IMUs). IK is performed on a kinematically constrained 3D human body model, to obtain joint angles of the body model, given orientation data of 17 sensors attached to different bone segments of the body. A bidirectional recurrent neural network (bi-RNN) is then trained using a newly collected IMU dataset to regress from the orientation data of 6 sensors to the joint angles obtained from IK. The training converged to a mean squared error (MSE) of 5.98 degrees. Keywords 1 MoCap, Recurrent Neural Network, Inverse kinematics (IK), Sparse IMUs 1. Introduction and Motivation Automated and accurate human motion capturing and analysis is a key requirement in many applications such as motion rehabilitation, performance analysis of athletes, ergonomic assessment in the workplace, AR/VR applications, etc. Two main types of motion capturing systems being used are optical or ultrasound based line-of-sight (LoS) methods that require a fixed marker sensor structure, and inertial sensor systems. Although LoS methods capture motion with good accuracy in laboratory environments, they lose their applicability in cases of un-constrained free space movement as it is a prerequisite of these systems that locations and orientations of their sensing elements (i.e. cameras) are known and invariant. On the other hand, IMU consisting of an accelerometer, gyroscope and magnetometer, which detect position and motion by measuring physical quantities like acceleration, angular velocity and angular acceleration, are relatively robust in real world environments. The commercially available inertial sensor systems for full-body human motion capture consist of around 20 IMUs. In such systems, individual sensor orientation is obtained by combining accelerometer, gyroscope and magnetometer data in a sensor fusion framework, such as Kalman filtering [1]. The data from each IMU is applied to a human bio-mechanical model using kinematic equations to obtain the position and orientation of the body segments. Xsens [2], which uses 17 sensors to reconstruct posture, demonstrated a RMS difference of less than 5 degrees in joint angles when compared to an optical position measurement system. However, the requirement that the subject wears a suit of 17 sensors and other accessories makes the system highly intrusive, affecting normal movement. In A. Martin, K. Hinkelmann, H.-G. Fill, A. Gerber, D. Lenat, R. Stolle, F. van Harmelen (Eds.), Proceedings of the AAAI 2021 Spring Symposium on Combining Machine Learning and Knowledge Engineering (AAAI-MAKE 2021) - Stanford University, Palo Alto, California, USA, March 22-24, 2021. EMAIL: deepak.nagaraj@aws-institut.de (Deepak Nagaraj); rhett.dobinson@aws-institut.de (Rhett Dobinson); dirk.werth@aws-institut.de (Dirk Werth) ORCID: 0000-0003-1102-1619 (A. 1); ©️ 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org) Recently, several studies have been carried out towards achieving real time full-body pose estimation using a reduced number of IMUs. In particular, deep learning (DL) based methods have been investigated in order to approximate the mapping from raw IMU data to joint angles of a 3D body [3]. The major challenge in generalizing the system, as is the case in most DL approaches, is acquiring the vast amounts of data representative of all kinds of poses and tasks. This problem becomes even more pronounced if the 3D body model, whose parameters being used as ground truths for supervised DL, is not kinematically well constrained. To overcome this problem, [3] performs DL on a large synthetic dataset, which was collected via forward kinematics on virtual sensors placed on a novel body model, and predicting pose parameters from real IMU inputs in real time. However, because of the absence of noise and drift from virtual sensor data, the model performance was much better when trained only using synthetic data than when trained using real IMU data. We try to address this problem of better constraining the 3D body model and accordingly the pose estimation by utilizing the research in the field of Biomechanics towards modeling and analysis of human motion. IK can be performed using the sensor orientation data to obtain ground truth joint angles with respect to a sufficiently constrained bio- mechanical model. These joint angles, along with orientation data from a reduced number of real IMUs can then be used to train a DL model for pose estimation. 2. Method The experimental methodology is as shown in figure 1. The left most component depicts the 17 Xsens sensors placement on the body. Given sufficient data pertaining to various movements, IK is performed on the data using orientation values of all 17 sensors to obtain joint angles of the OpenSim body model. Then, a bi-RNN is trained to regress from orientation values of 6 sensors (encircled in green) to the joint angles obtained from IK. During evaluation, the output (joint angles) from the bi-RNN, corresponding to test dataset, is visualized using OpenSim. Figure 1: Proposed approach involving Inverse Kinematics and Deep Learning 2.1. Data Collection In the scope of the project BauPrevent (more details in Acknowledgements section), wearable IMU data collection has already been carried out. Data was obtained from construction workers wearing IMU sensors undertaking their usual tasks, as the project focuses on preventive health, specifically for the construction domain. 7 hours of data from fifteen different subjects was collected resulting in 143 different motion sequences and close to a million IMU data frames. Various activities were performed targeting specific types of stresses as well as practical tasks from their typical daily routine. The following types of activities were recorded: lifting and carrying weights, lifting weights up onto a workbench, overhead fastening, plastering, grouting, sweeping, painting, hanging wallpaper and assembling/disassembling scaffolding. We utilized the commercially available Xsens [2] system that consists of 17 IMUs in total. The raw IMU data was collected along with computed information such as sensor and bone-segment orientation and joint angles however, we only use sensor orientation data as described in next sections. 2.2. Inverse Kinematics OpenSim [5], the most widely used open-source software package for biomechanical analysis, was used in order to obtain the joint-angle ground truths with respect to the chosen body model by carrying out IK. OpenSim through its OpenSense API feature, provides an offline measurement-scaling IK pipeline, where orientations of virtual sensors placed on model bone segments are fitted to the raw sensor orientations. The model-defined anatomical joints, that represent the movements, were then extracted from OpenSim MOT motion files. In order to represent physiologically feasible joint angles effectively and efficiently, we chose the body model demonstrated in [4] for our study, which was available as an OpenSim model. The model, in XML format, contains all the information needed for the biomechanical description of the human body, including body-segments, kinematic constraints (joints) and dynamic constraints(i.e. muscles). Since OpenSim models can be freely edited, the number of joint parameters can be adjusted according to constraint requirements. In our case this was constraining the model for reduced sensor DL. This required using least number of joint parameters possible whilst maintaining the most important degrees of freedom (DOF) and allowing effective posture representation. Our resulting body model reduced body joint parameters (DOFs) from 165 to 26, relating to the 13 joints (green dots) depicted on the 3D kinematic body model in figure 1; the considered body model [4] has very detailed spine with 126 DOFs having magnitude in the range of 0 to 4 degrees mostly (as observed), of which only 4 DOFs are considered. The 26 parameters thus obtained are the ground truth values for training the bi-RNN. 2.3. Deep Learning The model architecture and the training methodology employed is similar to that demonstrated by [3]. A two-layer bi-RNN, each with 512 LSTM units, was trained using the sensor orientation and joint parameter data discussed in sections 2.1 and 2.2. The sensor placement was similar to that of [3], however, unlike their approach, which used sensor orientation as rotation matrices and acceleration as additional input, only sensor orientation as quaternions was used in our study; provided that the acceleration data is often too noisy, is already incorporated through sensor fusion, and that the OpenSim API doesn’t require it for carrying out IK. The model was trained with sensor orientations from the 6 sensors each, consisting of 4D quaternions (resulting 24 values) as input, and 26 OpenSim model joint parameters from IK as output. The 143 different motion sequences with varying number of time-steps were divided into batches of 300 time-steps. Training and test data was split with the ratio of 0.8 to 0.2 and then shuffled. As depicted in figure 2, the mean squared error converged on training and test sets to 3.30 degrees and 5.98 degrees respectively after 89 epochs. Figure 2: Convergence of MSE during training the bi-RNN Figure 3 depicts the visualization through OpenSim, the output of the bi-RNN model (white skeleton) on test data in comparison with the ground truth (person), and that with the output from IK (blue skeleton). Because of the kinematic constraints of the body segments, despite reducing the number of DOFs of spine and the lower extremities of limbs, the posture matches reasonably well with ground truths and IK output. Figure 3: Comparison of the resulting postures with the ground truth and output from IK 3. Conclusion and Outlook In this paper, we present the first step towards developing a hybrid approach involving full-body IK and deep learning for accurate estimation of physiologically feasible joint angles in real time, based on orientation information from 6 inertial measurement units (IMUs). Recent advances in bio-mechanical modeling is taken into account to constrain the output of the deep learning model. The approach has shown promising results. However, it must be extensively tested with further dataset, especially on the movements not only pertaining to construction work. In this direction, we are in the process of collecting raw dataset from the 6 IMU hardware system developed by us. Furthermore, as proposed in [6], diverse bi-RNNs are to be trained with variable window size and random input sequences, forming ensemble of models for estimating poses more accurately and robustly. Acknowledgements This work is based on BauPrevent, a project partly funded by the German ministry of educationand research (BMBF) and the European Social Fund (ESF) as part of the “Zukunft der Arbeit:Mittelstand - innovativ und sozial” program, reference number 02L17C011. References [1] X. Yun, and E. R. Bachmann, Design, Implementation, and Experimental Results of a Quaternion-Based Kalman Filter for Human Body Motion Tracking, Robotics, IEEE Transactions on 22, 1216–1227 (2007). [2] M. Schepers, M. Giuberti, and G. Bellusci, Xsens MVN: Consistent Tracking of Human Motion Using Inertial Sensing, (2018), DOI=10.13140/RG.2.2.22099.07205. [3] Y. Huang, M. Kaufmann, E. Aksan, M. Black, O. Hilliges, and G. Pons-Moll, Deep inertial poser: Learning to reconstruct human pose from sparse inertial measurements in real time, ACM Trans. Graph. 37 (2018) 185:1–185:15., (2018) DOI=10.1145/3272127.3275108. [4] S. Schmid, K. A. Burkhart, B. T. Allaire, D. Grindle, and D. E. Anderson, Musculoskeletal full- body models including a detailed thoracolumbar spine for children and adolescents aged 6– 18 years, Journal of Biomechanics 102, 109305, (2020). [5] S. Delp, F. Anderson, A. Arnold, P. Loan, A. Habib, C. John, E. Guendelman, and D. Thelen, OpenSim: Open-Source Software to Create and Analyze Dynamic Simulations of Movement, Biomedical Engineering, IEEE Transactions on 54, 1940–1950, (2007). [6] D. Nagaraj, E. Schake, P. Leiner, and D. Werth, An RNN-Ensemble Approach for Real Time Human Pose Estimation from Sparse IMUs, Association for Computing Machinery, New York, NY, USA. DOI=10.1145/3378184.3378228, (2020).