<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Using One-Shot Machine Learning to Implement Real- Time Multimodal Learning Analytics</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Michael J Junokas</string-name>
          <email>junokas@illinois.edu</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Greg Kohlburn</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sahil Kumar</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Benjamin Lane</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Wai-Tat Fu</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Robb Lindgren</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>S, University of Illinois</institution>
          ,
          <addr-line>1310 S. Sixth St. Rm. 388, Champaign, IL 61820 NJ</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Educational research has demonstrated the importance of embodiment in the design of student learning environments, connecting bodily actions to critical concepts. Gestural recognition algorithms have become important tools in leveraging this connection but are limited in their development, focusing primarily on traditional machine-learning paradigms. We describe our approach to real-time learning analytics, using a gesturerecognition system to interpret movement in an educational context. We train a hybrid parametric, hierarchical hidden-Markov model using a one-shot construct, learning from singular, user-defined gestures. This model gives us access to three different modes of data streams: skeleton positions, kinematics features, and internal model parameters. Such a structure presents many challenges including anticipating the optimal feature sets to analyze and creating effective mapping schemas. Despite these challenges, our method allows users to facilitate productive simulation interactions, fusing of these streams into embodied semiotic structures defined by the individual. This work has important implications for the future of multimodal learning analytics and educational technology.</p>
      </abstract>
      <kwd-group>
        <kwd>educational technology</kwd>
        <kwd>gesture recognition</kwd>
        <kwd>one-shot machine learning</kwd>
        <kwd>cognitive embodiment</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        The connection between embodiment and human cognition has become increasingly
established within an array of academic domains [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ][
        <xref ref-type="bibr" rid="ref16">16</xref>
        ][
        <xref ref-type="bibr" rid="ref17">17</xref>
        ][
        <xref ref-type="bibr" rid="ref15">15</xref>
        ][
        <xref ref-type="bibr" rid="ref5">5</xref>
        ][
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. Specifically,
educational research has shown the importance of embodiment in designing effective
student learning environments [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ][
        <xref ref-type="bibr" rid="ref19">19</xref>
        ][
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. The idea that human cognition is
embedded in our bodily interactions with our physical environment has provoked the
need for technological tools that can explore, recognize, and apply users' movement in
educational interventions. The ability to reinforce the embodied nature of cognition and
to leverage it for effective learning has gained traction as the theoretical understanding
and technical capabilities of motion-capture systems expand. With limiting factors
eroding, we can start constructing models that fully take advantage of this developing
research and multimodal learning analytics (MMLA), creating applications that test the
bounds of embodied learning with interactive visualization technologies.
      </p>
      <p>
        While the opportunity to create more advanced models that engage MMLA for
embodied learning exists, the majority of interactive learning systems employ
directmanipulation paradigms (i.e. pushing virtual buttons, grasping virtual objects) rather
than empowering learners to use expressive gestures based on their own embodied
intuitions [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ][
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].By using machine-learning algorithms, we are able to recognize and
analyze symbolic gestures, mining parameters from users' movement, providing them
with a rich and minimally constraining palate to interact and develop within systems.
      </p>
      <p>Systems which utilize gesture-based input seem like a natural way to strengthen the
connection between body and mind in digital interaction yet many challenges remain
for the user and designer of such systems. One of the most significant inhibitors to
effective interaction is the cognitive load it takes to memorize and perform precise,
predefined inputs. Users are often forced to fit their movement into templates that have
generalized any relatable physical nuance of their gesture. This prevents users from
developing any kind of meaningful semiotic structure that gives them access to higher
level relationships and more abstract concepts, often resorting to simple and direct
connections between movement and ideas. In a fluid and dynamic interaction setting,
especially where the development of a personalized language of movement would
promote more established semiotic structures, this is not optimal. Users would be forced
to learn a collective gesture library onto which they would have to project their
expressions, remaining contained within impersonal constructs that are removed from
their own conceptions.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Proposed Model</title>
      <p>
        To empower users to define their own gestural relationships with symbols, we have
implemented a novel system that nurtures an effective learning environment through
real-time analytics, fostering embodied cognition. We train a hybrid hierarchical
hidden-Markov model (HHMM) [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] with one-shot training [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] creating a system that is
defined by and specific to the user. By specifying interaction to the individual, users
are able to form stronger semiotic frameworks much quicker and with a greater level
of satisfaction using our one-shot model in comparison to the same model that is
traditional.
      </p>
      <p>
        In order to collect motion data, we use the Microsoft Kinect V2 [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] due to it's relative
robustness-cost ratio and its portability. The Kinect captures movement through the
generation of depth maps, utilizing a camera and infrared sensor. From these depth
maps a skeleton frame representing the spatial position of a subject can be extracted.
Using the Kinect's application programming interface, our data collection software uses
Open Sound Control [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] to send a `skeleton' of joint-positions to our feature analysis
and gesture recognition components. This results in our first data stream, skeleton
position.
      </p>
      <p>
        Our feature generation and data processing software is written in Max 6/Jitter [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ],
extracting an array of features including positional derivatives (i.e. velocity,
acceleration, jerk), comparative features (i.e. relative positions and their derivatives,
dot product between features), and statistical metrics (i.e. summation of speed across
joints, analog for total body `energy', mean of feature windows) all at a variety of
timeframes concurrently. This results in our second data stream, kinematic features.
      </p>
      <p>
        From these features, we train a HHMM to recognize and extract abstract model
features from user gestures. The user is able to define as many gestures as they wish,
training each class in succession, using only one example for each class. Once the user
has recorded their complete gesture space, an HHMM adapted from the IRCAM
`MUBU' package [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] is trained. This model creates a machine-learned representation
of the user's interaction, generating temporal, probabilistic parameters. This results in
our third data stream, internal model parameters.
      </p>
      <p>After training, we combine these data streams create a hybrid model that classifies
gestures, extracts abstract model parameters, defines kinematic features, and measures
summary statistics. This model can then be used to recognize and analyze users'
gestures, allowing them to begin developing semiotic structure between abstract
concepts and their movement. While these streams all ultimately rely on a singular
source, they provide unique analytic opportunities and are thus considered different
modes.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Model Challenges</title>
      <p>
        While we have shown that our model can be successful in experimental tasks,
immersive simulation theatres, and several other applications [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ][
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], we continue to
address a variety of challenges including discovering the optimal set of features and
creating the most effective mapping schemas to benefit the learners.
      </p>
      <p>The biggest challenge we face is determining the most relevant features that engage
the user and provide analytic insight. While we can gather and analyze a multitude of
features, determining the most prescient features and how those features interact is an
opportunity for realizing a more effective model in performance and accuracy. For our
initial work, we chose to create a robust model that incorporates as many features as
we could collect in order to empirically determine the best feature set. We've
experimented with variety of approaches to focusing our features including setting
empirically determined thresholds, selecting only statistically significant features, and
providing users with the agency to select features of their choosing. While we've found
some success with these methods, more investigation needs to be done.</p>
      <p>While we are able to extract a variety of features, connecting them to the users
beyond intuitive responses has proven elusive. The ability to adapt our analytics in
realtime relies heavily on an efficient mapping schema that takes full advantage of all of
the features we are extracting, physical and abstract. Additionally, this is essential to
creating a more intuitive and empowering interaction for the user. In order to address
this, we've incorporated real-time, visual feedback into our training and simulations,
reinforcing user interaction through performance.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Conclusion</title>
      <p>The ability to ground abstract concepts in tangible, intuitive, embodied metaphors is a
vital foray into advancing effective education and other application interventions. By
creating a model in which the user can specifically and quickly define their interactions
through the fusion of different data streams, we offer a novel tool that begins to explore
one-shot learning as a means to real-time multimodal learning analytics.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1. MUBU for Max. http://forumnet.ircam.fr/product/mubu-en/. (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <article-title>Developing with Kinect for Windows</article-title>
          . https://developer.microsoft.com/enus/windows/kinect/develop. (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3. ELASTIC3S. http://elastics.education.illinois.edu/home/about/. (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Morphew</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mathayas</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Alameh</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Lindren</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <article-title>Exploring the relationship between gesture and student reasoning regarding linear and exponential growth</article-title>
          .
          <source>Proceedings of the International Conference of the Learning Sciences. (</source>
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Barsalou</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <article-title>Grounded cognition</article-title>
          .
          <source>Annu. Rev. Psychol</source>
          .
          <volume>59</volume>
          (
          <year>2008</year>
          ),
          <fpage>617</fpage>
          -
          <lpage>645</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Biskupski</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fender</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Feuchtner</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Karsten</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Willaredt</surname>
          </string-name>
          , J. Drunken ed:
          <article-title>a balance game for public large screen displays</article-title>
          .
          <source>CHI '14 Extended Abstracts on Human Factors in Computing Systems. ACM</source>
          , (
          <year>2014</year>
          ),
          <fpage>289</fpage>
          -
          <lpage>292</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Fei-Fei</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fergus</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Peronna</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <article-title>One-shot learning of object categories</article-title>
          .
          <source>IEEE transactions on pattern analysis and machine intelligence. 28</source>
          ,
          <issue>4</issue>
          , (
          <year>2006</year>
          ),
          <fpage>594</fpage>
          -
          <lpage>611</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Fine</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Singer</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Tishby</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <article-title>The hierarchical hidden Markov model: Analysis and applications</article-title>
          .
          <source>Machine learning 32, 1</source>
          , (
          <year>1998</year>
          ),
          <fpage>41</fpage>
          -
          <lpage>62</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Gallagher</surname>
            ,
            <given-names>S. How</given-names>
          </string-name>
          <article-title>the body shapes the mind</article-title>
          . Cambridge Univ Press.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Glenberg</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <article-title>Embodiment for education. Handbook of cognitive science: An embodied approach</article-title>
          . (
          <year>2008</year>
          ),
          <fpage>355</fpage>
          -
          <lpage>372</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Goldin-Meadow</surname>
          </string-name>
          ,
          <article-title>Susan. Learning through gesture</article-title>
          .
          <source>Wiley Interdisciplinary Reviews: Cognitive Science</source>
          .
          <volume>2</volume>
          ,
          <issue>6</issue>
          , (
          <year>2011</year>
          ),
          <fpage>595</fpage>
          -
          <lpage>607</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Ibister</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Karlesky</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Frye</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Rao</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Scoop</surname>
          </string-name>
          <article-title>!: a movement-based math game designed to reduce math anxiety</article-title>
          .
          <source>CHI' 12 Extended Abstracts on Human Factors in Computing Systems. ACM</source>
          , (
          <year>2012</year>
          ),
          <fpage>1075</fpage>
          -
          <lpage>1078</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Johnson</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <article-title>The body in the mind: The bodily basis of meaning, imagination, and reason</article-title>
          . University of Chicago Press.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Junokas</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Linares</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Lindgren</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <article-title>Developing gesture recognition capabilities for interactive learning systems: Personalizing the learning experience with advanced algorithms</article-title>
          .
          <source>Proceedings of the International Conference of the Learning Sciences. (</source>
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Lakoff</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          and Johnson,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Metaphors we live by</article-title>
          . University of Chicago Press.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <given-names>O</given-names>
            <surname>'Loughlin</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Embodiment and education</article-title>
          .
          <volume>15</volume>
          , (
          <year>2006</year>
          ), Springer.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Peters</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <article-title>Education and the Philosophy of the Body: Bodies Knowledge and Knowledges of the Body</article-title>
          .
          <article-title>Knowing bodies, moving minds</article-title>
          . (
          <year>2004</year>
          ),
          <fpage>13</fpage>
          -
          <lpage>27</lpage>
          , Springer.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Puckette</surname>
          </string-name>
          , M. Max/MSP (Version 6): Cycling'
          <fpage>74</fpage>
          . (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Roth</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <article-title>Gestures: Their role in teaching and learning</article-title>
          .
          <source>Review of Educational Research</source>
          .
          <volume>71</volume>
          ,
          <issue>3</issue>
          (
          <year>2001</year>
          ),
          <fpage>365</fpage>
          -
          <lpage>392</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Wright</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Freed</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <article-title>and others. Open sound control: A new protocol for communicating with sound synthesizers</article-title>
          .
          <source>Proceedings of the 1997 International Computer Music Conference</source>
          . Vol.
          <year>2013</year>
          .
          <volume>10</volume>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>