Modeling Human Motion and Contact with Gaussian Processes
                                                    Olivier Rémillard∗                                         Paul Kry†

                                                    School of Computer Science, McGill University


A BSTRACT                                                                                              1                                                                   10

We introduce a simple approach for using Gaussian processes to                                        0.9                                                                  9

model human motion involving contact. It comprises a low dimen-                                       0.8                                                                  8


                                                                             Cummulative Energy (%)
sion latent space with dynamics augmented by switching variable.                                      0.7                                                                  7

A Gaussian process models the time relationship of a motion while                                     0.6                                                                  6


                                                                                                                                                                                Error (cm)
the switching variable helps model the discontinuities created by                                     0.5                                                                  5

interaction with the environment.                                                                     0.4                                                                  4

                                                                                                      0.3                                                                  3
1 I NTRODUCTION
                                                                                                      0.2                                                                  2

Generating realistic and lifelike animated characters from captured                                   0.1                                                                  1
motion sequences is a hard and time-consuming task. The task is                                        0                                                                   0
                                                                                                            1    2      3       4      5           6    7   8         9   10
challenging due to the high dimensionality of human pose data and                                                                      Dimension
the complexity of the motion. Gaussian processes (GP) are useful             Figure 1: Effect of PCA Reduction on foot location over a walking
for modeling the dynamics of human movement when combined                    sequence of 125 poses. The blue curve depicts the cumulative en-
with a latent variable model to approximate the lower dimensional            ergy for a specific dimension. The red curve shows the mean error
manifold of human motion. These models alleviate the difficult               on the foot location of the reconstructed sequence in comparison to
problem of explicitly modeling physics and control, while provid-            the original sequence. The error bars show 2 standard deviations.
ing a means of predicting behaviour, with applications in tracking
and motion capture reuse.
   The GP model will give good predictions if the latent trajecto-           The result, qt0 , is not equal to the original pose because f is a pro-
ries are smooth. However, in many cases, the latent trajectories are         jection and the inverse does not reconstruct the original full space.
not smooth because they include abrupt changes when the actor’s                 Figure 1 illustrates the trade off between the latent space dimen-
body motion suddenly changes. It occurs often when a knee or an              sion and the resulting reconstruction error at an end effectors of the
elbow joint reaches its full extent and locks or again when the actor        body. Because of the hierarchical description of joints and angles
interacts with the environment. These interactions can be as simple          of a pose, the error is accumulated as a body part is deeper in the
as stepping on the floor, pushing or pulling an object. Our solution         hierarchy. Consequently, the error is greater at the feet and hands.
borrows ideas from Switching Gaussian Process Dynamical Model                As such, we use discrepancies of the foot location in qt and qt0 as
(SGPDM) [1]. However, instead of using the switching variable                an indicator of the overall quality of the mapping. We choose to
to separate different motions, say walking and running, we use the           use the three first principal component of the observed pose data as
switching variable to separate the nonsmooth dynamics acting on              transformation f because this captures 90% of the variation in the
the motion into distinct sets. This separation further permits the use       original motion (see Figure 1).
of simpler techniques such as principal component analysis (PCA)                With dimension reduction, the model simplifies to
to reduce the dimension of the original problem.
                                                                                                                                     p(zt+1 |zt ).
2 A PPROACH
                                                                             We will model the time dependence between two consecutive poses
We represent 3D human motion as a sequence of joints angles to               with a GP. This is a non-parametric approach for solving regression
describe how the pose changes over time, thus, a pose can be pack-           problem.
aged in a vector, and a motion in a matrix. Assuming a first order              Given a training motion sequence Q ∈ RNxD , and its latent se-
Markov dynamic, modeling human motion is the task of computing
                                                                             quence f (Q) = Z ∈ RNxd , we can model the dynamics with
                                 p(qt+1 |qt ),                                                                            T K −1 Z )
                                                                                                          exp − 21 trace(Z−
                                                                                                                                    
                                                                                                                                  −
where qt+1 denotes the next pose following the current pose, qt .                        p(Z+ |Z− , θ ) =     q                       ,
However, due to the high dimensionality of a pose, over 60 dimen-                                              (2π)(N−1)d |K|d
sions, most modeling techniques yield poor results. It is preferred
to reduce the size of the input space.                                       where K is the process kernel (covariance of the inputs), θ the ker-
   We reduce the dimension of the space with a linear mapping                nel’s hyper parameters, and Z+ = [z2 , ..., zN ]T is the vector of states
                                                                             that follow Z− = [z1 , ..., zN−1 ]T . The GP is maximized via
                                 zt = f (qt ),
                                                                                                                            θ = arg max p(Z+ |Z− , θ )
where q is the pose and z its low dimensional latent coordinates.                                                                       θ
Since f is a linear transformation, we can chose f such that its             by optimizing the log likelihood of p(Z+ |Z− , θ ) using scaled con-
inverse f −1 exists and use it to transform latent coordinate back to        jugate gradient methods. Once trained we obtain
poses
                                                                                                                 p(zt+1 |zt ) = N(µ(zt ), σ 2 (zt )),
                                qt0 = f −1 (zt ).
                                                                                                                               T −1
                                                                                                                       µ(z) = Z+ K k(z, Z− ),
   ∗ e-mail: remillardo@msn.com
   † e-mail: kry@cs.mcgill.ca                                                                                        σ 2 (z) = k(z, z) − k(z, Z− )T K −1 k(z, Z− ).


                                                                                                                                                                                 8
Figure 2: Walk. from left to right: a pose, the trajectory in latent space, inference of the model with no switching, inference with switching. The
sequence consists of one and half cycles of walking for a total of 125 poses. The sequence is separated in 2 states S:contact with left foot (blue)
and contact with the right foot(yellow).


Figure 3: Rowing machine. from left to right: a pose, the trajectory in latent space, inference of the model with no switching, inference with
switching. The sequence consists of two and half cycles of paddling for a total of more than 200 poses. The sequence is divided into 3 states S:
pushing on the oars (yellow), pulling on the oars (blue) and pause between pushing and pulling (red).


where k(z, Z− ) is the covariance function applied to the input z and        3   R ESULTS
the training set inputs Z− . Details of each step are found in [2, 3].      Figures 2 and 3 show two examples where model switching is of
   The linear transformation and the Gaussian Process described so          benefit (a walking motion, and paddling motion on a rowing ma-
far are sufficient to model some motions. Given an initial pose q0 ,        chine). The figures show the inference of 1000 poses using a model
we seek qt the tth pose following q0 . We start with z0 = f (q0 ) and       without the switching variable, and 1000 poses using the switch-
iteratively use the mean of the Gaussian process to obtain qt by            ing variable. The differences of both models reside in the ability to
                                                                            model the discontinuities of the trajectory. The main consequence
                                                                            of smoothing the discontinuities for the walking sequence is the
                           qt = f −1 (zt ),
                                                                            accentuation of foot skating. For the paddling sequence, this con-
                           zt = µ(zt−1 ).                                   sequence is reflected in the elimination of the pause (red) between
                                                                            the pushing and pulling movements.
2.1   Switching Models at Contact                                            4   C ONCLUSION
The problem with the usual formulation of the GP model is that              We presented a way to model human motion and contact using
it tends to smooth sharp turns in the trajectory. These discontinu-         Gaussian processes and a switching variable. The models we pro-
ities are characteristic to the contact forces acting on the human in       duce are useful for generating arbitrary length sequences of cyclic
motion, and those contacts should also be correctly modeled.                motion that can be adjusted to fit specific situations, and we have
    To cope with this predicament, we will add to the model a               the added benefit that the switching variable helps model discon-
switching variable s ∈ S to divide the motion into smaller subsets.         tinuities due to contacts. One limitation is that we must label the
Each variable should describe a situation where specific contact            switches in the training data. As future work, it would be interest-
forces are in action. For example, in the walking situation we could        ing to use an unsupervised framework to choose labels that optimize
have 4 values, S = {no contact, left foot, right foot, both}.               the fit of the model.
    The switching variable permits the decomposition of p(zt+1 |zt )         R EFERENCES
along the values of S, and the training of |S| individual GP mod-
els [1]. Besides, a mapping from latent space coordinates to switch-        [1] J. Chen, M. Kim, Y. Wang, and Q. Ji. Switching Gaussian process dy-
ing values can be expressed as a GP classification problem as ex-               namic models for simultaneous composite motion tracking and recog-
                                                                                nition. In Computer Vision and Pattern Recognition, pages 2655–2662,
plained in [2].
                                                                                2009.
    We can infer in this model the same way we did with the previous        [2] C. E. Rasmussen and C. K. I. Williams. Gaussian Processes for Ma-
formulation. The main difference being the use of more than one                 chine Learning. MIT Press, 2006.
Gaussian process. When stepping in the latent space zt = µ(zt−1 ),          [3] J. M. Wang, D. J. Fleet, and A. Hertzmann. Gaussian process dynamical
we should only use the mean function of the GP trained along the                models for human motion. IEEE Transactions on Pattern Analysis and
switching value of zt−1 .                                                       Machine Intelligence, 30:283–298, 2008.


                                                                                                                                                   9