<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Modeling Human Motion and Contact with Gaussian Processes</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Olivier Re´ millard</string-name>
          <email>remillardo@msn.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Paul Kry</string-name>
          <email>kry@cs.mcgill.ca</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>School of Computer Science, McGill University</institution>
        </aff>
      </contrib-group>
      <fpage>8</fpage>
      <lpage>9</lpage>
      <abstract>
        <p>We introduce a simple approach for using Gaussian processes to model human motion involving contact. It comprises a low dimension latent space with dynamics augmented by switching variable. A Gaussian process models the time relationship of a motion while the switching variable helps model the discontinuities created by interaction with the environment. Generating realistic and lifelike animated characters from captured motion sequences is a hard and time-consuming task. The task is challenging due to the high dimensionality of human pose data and the complexity of the motion. Gaussian processes (GP) are useful for modeling the dynamics of human movement when combined with a latent variable model to approximate the lower dimensional manifold of human motion. These models alleviate the difficult problem of explicitly modeling physics and control, while providing a means of predicting behaviour, with applications in tracking and motion capture reuse. The GP model will give good predictions if the latent trajectories are smooth. However, in many cases, the latent trajectories are not smooth because they include abrupt changes when the actor's body motion suddenly changes. It occurs often when a knee or an elbow joint reaches its full extent and locks or again when the actor interacts with the environment. These interactions can be as simple as stepping on the floor, pushing or pulling an object. Our solution borrows ideas from Switching Gaussian Process Dynamical Model (SGPDM) [1]. However, instead of using the switching variable to separate different motions, say walking and running, we use the switching variable to separate the nonsmooth dynamics acting on the motion into distinct sets. This separation further permits the use of simpler techniques such as principal component analysis (PCA) to reduce the dimension of the original problem.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
    </sec>
    <sec id="sec-2">
      <title>APPROACH</title>
      <p>We represent 3D human motion as a sequence of joints angles to
describe how the pose changes over time, thus, a pose can be
packaged in a vector, and a motion in a matrix. Assuming a first order
Markov dynamic, modeling human motion is the task of computing
where qt+1 denotes the next pose following the current pose, qt .
However, due to the high dimensionality of a pose, over 60
dimensions, most modeling techniques yield poor results. It is preferred
to reduce the size of the input space.</p>
      <p>We reduce the dimension of the space with a linear mapping
where q is the pose and z its low dimensional latent coordinates.
Since f is a linear transformation, we can chose f such that its
inverse f 1 exists and use it to transform latent coordinate back to
poses
p(qt+1jqt );
zt = f (qt );
qt0 = f 1(zt ):
1
0.9
)0.8
(%0.7
y
reg0.6
n
eE0.5
v
it
lau0.4
m
m0.3
u
C0.2
0.1
01
2 3 4 5 6 7 8 9</p>
      <p>Dimension
Figure 1: Effect of PCA Reduction on foot location over a walking
sequence of 125 poses. The blue curve depicts the cumulative
energy for a specific dimension. The red curve shows the mean error
on the foot location of the reconstructed sequence in comparison to
the original sequence. The error bars show 2 standard deviations.
The result, qt0 , is not equal to the original pose because f is a
projection and the inverse does not reconstruct the original full space.</p>
      <p>Figure 1 illustrates the trade off between the latent space
dimension and the resulting reconstruction error at an end effectors of the
body. Because of the hierarchical description of joints and angles
of a pose, the error is accumulated as a body part is deeper in the
hierarchy. Consequently, the error is greater at the feet and hands.
As such, we use discrepancies of the foot location in qt and qt0 as
an indicator of the overall quality of the mapping. We choose to
use the three first principal component of the observed pose data as
transformation f because this captures 90% of the variation in the
original motion (see Figure 1).</p>
      <p>With dimension reduction, the model simplifies to</p>
      <p>p(zt+1jzt ):
We will model the time dependence between two consecutive poses
with a GP. This is a non-parametric approach for solving regression
problem.</p>
      <p>Given a training motion sequence Q 2 RNxD, and its latent
sequence f (Q) = Z 2 RNxd , we can model the dynamics with
p(Z+jZ ; q ) =
exp
q
21 trace(ZT K 1Z )
(2p)(N 1)d jKjd
;
where K is the process kernel (covariance of the inputs), q the
kernel’s hyper parameters, and Z+ = [z2; :::; zN ]T is the vector of states
that follow Z = [z1; :::; zN 1]T . The GP is maximized via
q = arg max p(Z+jZ ; q )</p>
      <p>q
by optimizing the log likelihood of p(Z+jZ ; q ) using scaled
conjugate gradient methods. Once trained we obtain
p(zt+1jzt ) = N(m(zt ); s 2(zt ));
m(z) = Z+TK 1k(z; Z );
s 2(z) = k(z; z)
k(z; Z )T K 1k(z; Z ):
10
9
8
7
6 )</p>
      <p>m
5 (rc</p>
      <p>
        o
4 rrE
where k(z; Z ) is the covariance function applied to the input z and
the training set inputs Z . Details of each step are found in [
        <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
        ].
      </p>
      <p>The linear transformation and the Gaussian Process described so
far are sufficient to model some motions. Given an initial pose q0,
we seek qt the tth pose following q0. We start with z0 = f (q0) and
iteratively use the mean of the Gaussian process to obtain qt by
qt = f 1(zt );
zt = m(zt 1):
2.1</p>
    </sec>
    <sec id="sec-3">
      <title>Switching Models at Contact</title>
      <p>The problem with the usual formulation of the GP model is that
it tends to smooth sharp turns in the trajectory. These
discontinuities are characteristic to the contact forces acting on the human in
motion, and those contacts should also be correctly modeled.</p>
      <p>To cope with this predicament, we will add to the model a
switching variable s 2 S to divide the motion into smaller subsets.
Each variable should describe a situation where specific contact
forces are in action. For example, in the walking situation we could
have 4 values, S = fno contact, left foot, right foot, bothg.</p>
      <p>
        The switching variable permits the decomposition of p(zt+1jzt )
along the values of S, and the training of jSj individual GP
models [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Besides, a mapping from latent space coordinates to
switching values can be expressed as a GP classification problem as
explained in [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>We can infer in this model the same way we did with the previous
formulation. The main difference being the use of more than one
Gaussian process. When stepping in the latent space zt = m(zt 1),
we should only use the mean function of the GP trained along the
switching value of zt 1.
3</p>
    </sec>
    <sec id="sec-4">
      <title>RESULTS</title>
      <p>Figures 2 and 3 show two examples where model switching is of
benefit (a walking motion, and paddling motion on a rowing
machine). The figures show the inference of 1000 poses using a model
without the switching variable, and 1000 poses using the
switching variable. The differences of both models reside in the ability to
model the discontinuities of the trajectory. The main consequence
of smoothing the discontinuities for the walking sequence is the
accentuation of foot skating. For the paddling sequence, this
consequence is reflected in the elimination of the pause (red) between
the pushing and pulling movements.
4</p>
    </sec>
    <sec id="sec-5">
      <title>CONCLUSION</title>
      <p>We presented a way to model human motion and contact using
Gaussian processes and a switching variable. The models we
produce are useful for generating arbitrary length sequences of cyclic
motion that can be adjusted to fit specific situations, and we have
the added benefit that the switching variable helps model
discontinuities due to contacts. One limitation is that we must label the
switches in the training data. As future work, it would be
interesting to use an unsupervised framework to choose labels that optimize
the fit of the model.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Q.</given-names>
            <surname>Ji</surname>
          </string-name>
          .
          <article-title>Switching Gaussian process dynamic models for simultaneous composite motion tracking and recognition</article-title>
          .
          <source>In Computer Vision and Pattern Recognition</source>
          , pages
          <fpage>2655</fpage>
          -
          <lpage>2662</lpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>C. E.</given-names>
            <surname>Rasmussen</surname>
          </string-name>
          and
          <string-name>
            <given-names>C. K. I.</given-names>
            <surname>Williams</surname>
          </string-name>
          .
          <article-title>Gaussian Processes for Machine Learning</article-title>
          . MIT Press,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. J.</given-names>
            <surname>Fleet</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Hertzmann</surname>
          </string-name>
          .
          <article-title>Gaussian process dynamical models for human motion</article-title>
          .
          <source>IEEE Transactions on Pattern Analysis and Machine Intelligence</source>
          ,
          <volume>30</volume>
          :
          <fpage>283</fpage>
          -
          <lpage>298</lpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>