<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Pre-conference Workshop), March</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Qualitative Parameter Triangulation: A Formulated Approach to Parameterize Multimodal Models</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Yeyu Wang</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andrew R. Ruis</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>David Williamson Shafer</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Wisconsin - Madison</institution>
          ,
          <addr-line>1025 W Johnson St, Madison, WI, USA, 53703</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2023</year>
      </pub-date>
      <volume>14</volume>
      <issue>2023</issue>
      <fpage>0000</fpage>
      <lpage>0003</lpage>
      <abstract>
        <p>Data fusion and parameterization based on qualitative insights are two key challenges in multimodal learning analytics. In this study, we propose Qualitative Parameter Triangulation (QPT) to address these two challenges. In particular, QPT generate optimized parameter values for the type of multimodal learning models that are event-based, process-oriented and connection-structured concerning with recent temporality.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Quantitative Ethnography</kwd>
        <kwd>Methodology</kwd>
        <kwd>Model Elicitation</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <sec id="sec-1-1">
        <title>1.1. Multimodalities</title>
        <p>
          Humans interact and communicate through variant modes. According to [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ], mode refers to
channels of representations which are socially and culturally shared. These socially shaped
representations serve as diferent functionalities in the communication processes [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. In Kress’s
book, he argues that "in communication several modes are always used together, in modal
essembles, designed so that each mode has a specific task and function (p.28). " Thus, one of the
key questions in multimodality studies is the modal afordance, which characterizes the "reach"
of one mode influencing others.
        </p>
        <p>
          However, multimodality is not a simple sum of various modes. Instead, multimodality studies
the relationship between diferent modes [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. According to [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ], multimodality is defined as “an
inter-disciplinary approach drawn from social semiotics that understands communication and
representation as more than language and attends systematically to the social interpretation of
a range of forms of making meaning. (p.250) ” This definition creates link between
communication and learning. Social semiotics, defined by [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ], is a product of knowledge construction. [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]
conceptualize knowledge as a problem-solving tool, which is created by multimodal
representations based on their modal afordances. However, multimodality is not a simple sum of modal
afordances. Instead, multimodality studies the relationship between diferent modes [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. Thus,
multimodal learning analytics study cross-modal interactions during learning processes.
        </p>
        <p>
          To study interactional processes of learning, [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] emphasizes the importance of
processoriented approach. He conducted multimodal learning analytics in the context of an engineering
challenge and compared non-process-oriented and process-oriented approaches. The
processoriented approach provides critical insights about the characteristics of learners, as "planner"
and "thinker", which was not manifested in the non-process-oriented approach. Specifically,
he claims that temporality and sequences in learning activities are essential in the
processoriented multimodal learning models. Similarly, argued by [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ], multimodal communication
and interactions are provisional and temporality-critical. That is, communication theories
assumes that humans evaluate social scenarios and shapes their communicational encounters
within a recent temporal frame. Thus, studying process of multimodal learning is to investigate
relationships across modes and connections between events that are temporally organized.
        </p>
      </sec>
      <sec id="sec-1-2">
        <title>1.2. Challenges in Multimodal Models</title>
        <p>To represent an event-based, process-oriented and connection-structured multimodal learning
process, there are two major challenges.</p>
        <p>
          First, how to fuse multimodal data with varying time scales and frequencies. Depending
on utilities and assumptions involved, there are three categories of data fusion: naive fusion,
low-level fusion and high-level fusion [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. Naive fusion is commonly used in exploratory studies,
in which data is aggregated into features without specific assumptions. With prior knowledge
and assumptions of data, researchers construct features in a small-time scale which describes
the relationship across events. High-level fusion requires more assumptions and theoretical
foundations about turning data into meanings. According to [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ], data fusion face challenges
exist in both collection and modeling. Noncommensurability and incompatible size of data
are issues for data collected from diferent instruments and devices. Noncommensurability
refers to the issue that the raw format of data does not commute. For example, data collected
from electrodermal activity (EDA) is not directly related to eye movements in a study of
mind wondering, which requires the first step of transformation [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. Also, due to diferent
observational mode, the size of data samples varies, which may result in large uncertainty and
bias in modeling.
        </p>
        <p>
          Second, how to elicit a quantitative parameter based on qualitative understanding. In the
operationalization step, that is how to transform the qualitative information to quantitative
parameters for further modeling. This challenge is not uncommon in a mixed-method study
and solutions exist in unimodal analysis. For example, quantitative ethnography [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] defines
the mechanical grip between observations and interpretations as codes, which transforms
the qualitative records to binary numbers. Additionally, the operationalization of common
ground in a discourse is defined as a window function [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ], which assumes that codes within
the recent temporal context are connected to each other. Here, the size of the window is an
operationalization of recent temporal context. Within the window, the connections between
codes is in presence; otherwise, there is no connections. However, these methods are easier to
be applied to a unimodal dataset. With multimodality, the complexity of eliciting a parameter
increases due to interactions and the irregularity of various modes. For example, it becomes a
challenge for qualitative researchers or domain experts to elicit relationship between modes:
how many times an eye-gaze event has a longer impact compared to a event of log data?
        </p>
      </sec>
      <sec id="sec-1-3">
        <title>1.3. A Solution: Qualitative Parameter Triangulation (QPT)</title>
        <p>We propose a formulated approach called Qualitative Parameter Triangulation (QPT) to address
the two challenges above.</p>
        <p>
          First, modeling involving QPT doesn’t require data fusion; instead, QPT helps with
determining parameter values in a pre-defined function. That is, the dataset can preserve its raw
representation as long as meeting the requirement of evidentiary completeness, ontological
consistency and terminlogical consistency [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]. For example, data collected from diferent streams can
be organized into data spreadsheet: rows contains all kinds of information while each column
contains one type of information. Instead of generate aggregated features as traditional data
fusion, QPT facilitates parameterization by describing relationships between modes and events
as mathematical functions. As mentioned above, the temporal impact between two events can
be operationalized as a window, which is a step function in its mathematical form. Based on a
theory in communication sciences that each mode serves a diferent functionality, we can vary
the mathematical function to describe relationship between events for various modes. With
the simplest example, we can vary the length of window to describe the survival of impact for
diferent modes.
        </p>
        <p>Second, QPT provides a formulated structure to help researchers to elicit their hypothesis on
the qualtiative data. Instead of asking directly about the relationship between two modes, QPT
automatedly constructs networks based on qualitative researcher’s narratives and optimizes
parameter values for the next-step modeling.</p>
      </sec>
      <sec id="sec-1-4">
        <title>1.4. Usage and Combination with Other Models</title>
        <p>
          QPT can be combined with any models that is event-based, process-oriented and
connectionstructured with consideration of temporalities, such as lag sequential pattern mining, process
mining, etc. In this paper, we use Epistemic Network Analysis (ENA) as an example to
demonstrate how QPT can be used to determine parameters: the window size of diferent modes. We
select ENA due to its afordances of modeling interactivity and interdependence in
problemsolving processes [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ], which is event-based, process-oriented, and connection-structured
temporally. which is aligned with the context of test dataset. The dataset of demonstration is
collected from puzzle-solving game, called baba is you. See the next section for more details.
However, as a conceptual and methodological framework, QPT can be combined with other
methods, such as lag sequential mining, transition status analysis, etc. In this paper, we will use
ENA as an example.
        </p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. QPT Approach</title>
      <sec id="sec-2-1">
        <title>2.1. Overview</title>
        <p>QPT helps with eliciting assumptions based on qualitative understanding and optimizes the
parameters using automated approach for further modeling. For example, if the goal is to
determine the length of active-impact window for diferent modes, the inputs include: (1) human’s
qualitative interpretation of connections made, given randomly sampled time points, and (2)
number of parameters. Then, QPT outputs the optimized window size for each mode that can
be used as the window parameters in ENA. We refers to three key concepts in this
triangulation: qualitative story, network representation (connections) and parameter determination.
By optimizing the parameters, QPT minimize the diferences between qualitative story and
quantitative connections.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Worked Example: Determining Window Sizes for Two Modes Using QPT</title>
        <p>Start with a multimodal dataset with evidentiary completeness, ontological consistency and
terminlogical consistency. Define parameters needed to describe impacts of diferent modes. In
this worked example, we defines the two parameters: eye-gazing and log data.</p>
        <p>Step 1: Randomly select K lines from the whole dataset.</p>
        <p>Step 2: For each line as a referring line, let the qualitative researcher tell a story about
the learning event. For example, in the context of a digital learning game, line 10 is an
eyegazing event which captures the player looking at a specific object. The qualitative researcher
elaborates on their understanding about why the player looked at such an object. Then, based
on the content included in researcher’s narrative, we can determine the connections between
codes in a network representation. In the example, there is a connection from code A to itself
and a connection from code B to code A. Thus, the entry (1,1) and (2,1) in the adjacency matrix
are marked as 1; for the rest of entries, mark as 0.</p>
        <p>Step 3: Repeat Step 2 for K time. The coded network structures serve as K ground-truth
labels.</p>
        <p>Step 4: Use an automated optimization algorithm to determine the parameters for eye-gazing
and log-data, which result in the least diferences between the ground-truth labels ( ()) and
estimated network when plugging in parameters ( (*)) into ENA.For example, let * be the
window of impact parameter for eye-gaze mode. If * is 10, the model assumes that one
eye-gaze event will have approximately active impact on all events happening in the next 10
seconds. Similarly, we need to optimize * to derive a complex model.</p>
        <p>After QPT provides optimized parameters, researchers close the interpretive loop by checking
whether the parameters are aligned with their original understanding. With validated
parameters and interpretive alignment, use the parameter values to construct an multimodal ENA
model.</p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Mathematical Notations</title>
        <sec id="sec-2-3-1">
          <title>2.3.1. Human Labels of Connections</title>
          <p>We randomly select  lines from the dataset regardless of modes. Let a randomly selected line
be the referring line (),  ∈ {1, 2, 3, ..., }. Each () has a cooresponding adjacency matrix
to represent what connections were made between any two codes, determined by qualitative
understanding by researchers or domain experts. Let () be the matrix, which represents
the presence of connections between two codes. Let () be the binary value to indicate the
presence of connections between code i and code j, given the referring line (). If () = 1,
there is a connection between code i and code j; otherwise, no connection between code i and
code j.</p>
        </sec>
        <sec id="sec-2-3-2">
          <title>2.3.2. Deriving Parameters for Diferent Modes</title>
          <p>A multimodal dataset may include  modes. For each mode , we will determine one parameter
() as the length of a temporal window, which describes how long one event from such a mode
have impacts on other events. Let P be a vector to record all parameters for multimodalities.</p>
          <p>QPT optimizes the vector of parameters P* , which represents the active impact windows for
diferent modes with least error. With any model M, which describes interdependence between
two events using connections, we can plug in P* into function M. That is,  (*) = M(*(), P* ).
To derive P* , let () ( ∈ {1, 2, 3, ..., }) be the ground truth. Define L((),  (*)) as the
loss function which describes the sum of diference between () and  (*). We optimize P* by:
argmin ∑︁ L((), M((), P* ))</p>
          <p>P* 
Start with random value P0 and use gradient descent algorithm to converge to the local minimum
of L. Usually, we use gradient descent algorithm to derive the minimum local mean.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Discussion</title>
      <p>In this paper, we propose a method called Qualitative Parameter Triangulation to determine
the parameter values in a multimodal learning model. Specifically, this approach address the
challenges of data fusion and parameterization based on qualitative insights. Additionally, the
key concept of engaging qualitative researchers in the loop ensures the interpretive alignment,
which provides potentials for closing feedback loop with other stakeholders in the multimodal
study. Future work are as follows: (1) Use empirical data to test the eficacy of QPT. (2) Try
diferent models besides ENA. (3) Try diferent methods of optimization (such as using Gibbs
sampling to estimate parameters of diferent modes iteratively). (4) Create multimodal interface
to facilitate assumption elicitation for research eficiency and closing-loops in human-computer
interactions.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>T.</given-names>
            <surname>Van Leeuwen</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Kress, Discourse semiotics, Discourse studies: A multidisciplinary introduction 2 (</article-title>
          <year>2011</year>
          )
          <fpage>107</fpage>
          -
          <lpage>125</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>G.</given-names>
            <surname>Kress</surname>
          </string-name>
          ,
          <article-title>Multimodality: A social semiotic approach to contemporary communication</article-title>
          ,
          <source>Routledge</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>C.</given-names>
            <surname>Jewitt</surname>
          </string-name>
          ,
          <article-title>Multimodal methods for researching digital technologies, The SAGE handbook of digital technology research (</article-title>
          <year>2013</year>
          )
          <fpage>250</fpage>
          -
          <lpage>265</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>C.</given-names>
            <surname>Jewitt</surname>
          </string-name>
          ,
          <article-title>Multimodal analysis, in: The Routledge handbook of language and digital communication</article-title>
          ,
          <source>Routledge</source>
          ,
          <year>2015</year>
          , pp.
          <fpage>83</fpage>
          -
          <lpage>98</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>P.</given-names>
            <surname>Blikstein</surname>
          </string-name>
          ,
          <article-title>Multimodal learning analytics</article-title>
          ,
          <source>in: Proceedings of the third international conference on learning analytics and knowledge</source>
          ,
          <year>2013</year>
          , pp.
          <fpage>102</fpage>
          -
          <lpage>106</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>M.</given-names>
            <surname>Worsley</surname>
          </string-name>
          ,
          <article-title>Multimodal learning analytics as a tool for bridging learning theory and complex learning behaviors</article-title>
          ,
          <source>in: Proceedings of the 2014 ACM workshop on Multimodal Learning Analytics Workshop and Grand Challenge</source>
          ,
          <year>2014</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>4</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>D.</given-names>
            <surname>Lahat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Adali</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Jutten</surname>
          </string-name>
          ,
          <article-title>Multimodal data fusion: an overview of methods, challenges, and prospects</article-title>
          ,
          <source>Proceedings of the IEEE</source>
          <volume>103</volume>
          (
          <year>2015</year>
          )
          <fpage>1449</fpage>
          -
          <lpage>1477</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>I.</given-names>
            <surname>Brishtel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. A.</given-names>
            <surname>Khan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Schmidt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Dingler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ishimaru</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Dengel</surname>
          </string-name>
          ,
          <article-title>Mind wandering in a multimodal reading setting: Behavior analysis &amp; automatic detection using eye-tracking and an eda sensor</article-title>
          ,
          <source>Sensors</source>
          <volume>20</volume>
          (
          <year>2020</year>
          )
          <fpage>2546</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>D. W.</given-names>
            <surname>Shafer</surname>
          </string-name>
          , Quantitative ethnography, Lulu. com,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>A.</given-names>
            <surname>Ruis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Siebert-Evenstone</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Pozen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Eagan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. W.</given-names>
            <surname>Shafer</surname>
          </string-name>
          ,
          <article-title>Finding common ground: A method for measuring recent temporal context in analyses of complex, collaborative thinking (</article-title>
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Swiecki</surname>
          </string-name>
          ,
          <article-title>Measuring the impact of interdependence on individuals during collaborative problem-solving.</article-title>
          ,
          <source>Journal of Learning Analytics</source>
          <volume>8</volume>
          (
          <year>2021</year>
          )
          <fpage>75</fpage>
          -
          <lpage>94</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>