<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Ready for Multimodal Interaction: Integrating Text- and Voice Chat into Hyperchalk</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Hendrik Drachsler</string-name>
          <email>h.drachsler@dipf.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sebastian Gombert</string-name>
          <email>s.gombert@dipf.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lukas Menzel</string-name>
          <email>menzel@sd.uni-frankfurt.de</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Onur Karademir</string-name>
          <email>o.karademir@dipf.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Daniele Di Mitri</string-name>
          <email>d.dimitri@dipf.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>DIPF: Leibniz Institute for Research and Information in Education</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Goethe Universität Frankfurt</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Online Whiteboard, Multimodal Learning Analytics</institution>
          ,
          <addr-line>Voice Chat, Text Chat, Computer-supported collab-</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2023</year>
      </pub-date>
      <abstract>
        <p>Hyperchalk is an open-source online whiteboard aimed at use cases in education and has been developed with Learning Analytics in mind. It allows users to sketch and draw in a collaborative environment. Similar to commercial whiteboard tools, this allows the implementation of various collaborative learning tasks. However, unlike commercial solutions, in the background, the tool collects trace data, which can be used to study learners' collaboration processes and calculate metrics that inform teachers about their learners' collaboration processes. Collaboration, however, always involves communication between collaborators, and analyzing it is crucial for understanding emerging learning processes. So far, this was a dead spot of Hyperchalk as the tool was limited to collecting only what happened on the whiteboard itself. In this paper, we present an updated version of the tool, which allows users to communicate directly within Hyperchalk with the help of text and voice chat functionalities that have been added. In particular, we illustrate the technical architecture that is a basis for these added features.</p>
      </abstract>
      <kwd-group>
        <kwd>orative learning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Hyperchalk is an open-source online whiteboard. It is intended to function as a tool for
implementing dynamic, collaborative learning activities in computer-supported collaborative
learning scenarios and collects rich log data about the collaboration processes to understand
the learners better and support them through Learning Analytics [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. However, so far, this data
collection has ignored essential components of computer-supported collaborative learning: oral
and written communication. In current synchronous online learning scenarios, communication
is mainly managed through video conferencing tools, which provide limited access to
communication data. For Learning Analytics research, this can be problematic as it can be hard to
fully understand the collaboration processes that occur during collaborative learning activities
without analysing the actual communication of the people involved [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>For this reason, we updated Hyperchalk with voice-over-IP and text chat functionalities.
Users can now use these functionalities on all whiteboards, allowing instructors to deactivate
them optionally. For implementing these features, we chose a centralized approach so that
all communication between learners is sent over the Hyperchalk server. This guarantees
that it can be fully logged and is available for real-time or post-hoc analyses. Moreover, to
allow practitioners to create interactive learning experiences with Hyperchalk, we ofer APIs
to integrate external bots with the tool that can send learners text- and voice messages and
manipulate the board state.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Background</title>
      <p>
        Work on computer-supported collaborative learning is often based on constructivist pedagogy.
Speaking in simple terms, this school of thinking follows the notion that learners don’t simply
learn by transmitting and receiving learning content. Instead, they need to (re-)construct their
understanding of what is to be learned from the input they receive. Following this notion,
in computer-supported collaborative learning scenarios, it is often emphasized that learners
communicate to help extend each other’s understanding of a given topic [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ].
      </p>
      <p>
        In collaborative learning scenarios, learners often take on diferent team roles [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. These
influence what and to which degree learners can acquire knowledge in these scenarios. Sometimes,
such roles are explicitly assigned, but more often, they emerge naturally from the individual
group dynamics. Dowell et al. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] found that such emerging roles can be identified in multiparty
communication employing vector-based natural language processing methodology. Menzel
et al. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] followed up on this work by identifying roles as a basis for providing learners with
role-specific feedback.
      </p>
      <p>
        Online whiteboards allow the implementation of a whole range of diferent collaborative
activities. Like analogous whiteboards, users can collaboratively draw, sketch and illustrate in
group contexts [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. This usually goes hand-in-hand with discussions and extensive
communicative processes. Consequently, computer-supported collaborative learning is often highly
multimodal by nature [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], and whiteboard-based learning tasks are no exception. We suspect
that analyzing both learners’ communication and interactions on the whiteboard is crucial for
understanding the entire learning processes that emerge during such scenarios.
      </p>
      <p>
        Multimodal learning analytics is the field of research dedicated to studying and understanding
multimodal data produced by learners during such scenarios [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. In this context, the look at
multiple modalities aims to acquire a complete picture of what happens during learning, as
diferent aspects of learning can be expressed through diferent modalities. On the one hand,
this involves the products learners create during learning activities, which can be multimodal by
themselves. On the other hand, learning processes can be expressed through diferent modalities,
particularly during computer-supported collaborative learning where communication plays a
key role [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. We suspect that multimodal indicators extracted from both learners’ whiteboard
interactions and communication can be utilized to identify learner roles and to provide learners
with individualized feedback similar to Menzel et al. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Method</title>
      <p>To make Hyperchalk ready for multimodal learning analytics in computer-supported
collaborative learning scenarios, we extended the tool to support voice- and text chat to allow the
collection of learner data from these modalities, which can then be analyzed in combination
with trace data from the whiteboard itself.</p>
      <p>We added two additional buttons to the interface to make the chat functionalities available
to the users in Hyperchalk. The goal was to integrate them into the existing user interface
seamlessly. For this reason, we added two respective buttons to the tool’s menu bar. Figure 1
depicts a screenshot of the updated menu bar of Hyperchalk with these additional buttons.</p>
      <p>The first button (1) opens a chat window where users can exchange text messages. They are
also provided with a simple emoji picker similar to the ones found in established messaging
apps. Moreover, they can send images and files. This aims to make the user experience similar
to established messengers so that users are not driven to use them instead of our integrated
chat function. A link preview function is also provided with the chat.</p>
      <p>The second button (2) allows users to turn their microphones on and of. This feature can
be immediately used after granting the browser access to users’ microphones. All users are
automatically in the voice chat when they enter the room and can hear other participants
if their microphones are turned on. This is intended to keep barriers for users to engage in
communication low.</p>
      <p>Both communication features use a centralised approach. This means all communication is
sent to the server, which broadcasts it to other users in a room. The reasoning is to make all
communication traceable without encountering synchronization issues that are more likely
to emerge for a decentralized approach. In the following sections, we describe the technical
properties of both functions.</p>
      <sec id="sec-3-1">
        <title>3.1. Text Chat</title>
        <p>
          For transmission of chat messages, we use a simplified version of the XMPP protocol [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ],
an established open text messaging standard. Messages are sent to the server, which then
broadcasts them to the other clients in the session. Moreover, all chat messages are stored in the
database for later analysis. The replay mode of Hyperchalk also supports playback of the chats.
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Voice Chat</title>
        <p>
          We use WebSockets to transmit audio messages between the server and clients. All audio
messages are stored on the server, as well. For this purpose, incoming audio streams are bufered
and stored with a timestamp. A constantly running background task then segments the bufered
audio streams into individual speech segments using Pyannote [
          <xref ref-type="bibr" rid="ref10 ref11">10, 11</xref>
          ] and Diart [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]. The audio
is encoded using the Opus codec and stored as OGG files on the server using Pedalboard [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ].
An individual ID and a timestamp are generated and stored in the database for documentation
for each file. Figure 2 depicts this architecture schematically.
        </p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. API for External Services</title>
        <p>On top of the chat functionalities, we ofer an API for external services to interact with
Hyperchalk. These external services can access respective board states and manipulate them to react
dynamically and interact with users. This allows for implementing flexible, interactive learning
activities in Hyperchalk through external backend services.</p>
        <p>In this context, we also allow for the exchange of audio streams and chat messages between
Hyperchalk and external services. On the one hand, this allows practitioners to develop and
couple live analytics services like live transcription, afect detection or discourse analysis with
the software. On the other hand, external services can use these APIs to pipe audio streams
and chat messages into rooms so that it is possible to integrate voice-driven chatbots or virtual
instructors that interact with the learners via Hyperchalk.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusion</title>
      <p>In this paper, we introduced and outlined text- and voice-chat features for Hyperchalk. With
these, we aim to make the software ready for multimodal learning analytics to study the
collaboration processes of users within boards. Moreover, we introduced the option to provide
external services that interact with users through text and audio and react to and manipulate
the board state with Hyperchalk.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>L.</given-names>
            <surname>Menzel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gombert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. Di</given-names>
            <surname>Mitri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Drachsler</surname>
          </string-name>
          ,
          <article-title>Superpowers in the classroom: Hyperchalk is an online whiteboard for learning analytics data collection</article-title>
          , in: I. Hilliger,
          <string-name>
            <given-names>P. J.</given-names>
            <surname>Muñoz-Merino</surname>
          </string-name>
          , T. De Laet,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ortega-Arranz</surname>
          </string-name>
          , T. Farrell (Eds.),
          <source>Educating for a New Future: Making Sense of Technology-Enhanced Learning Adoption</source>
          , Springer International Publishing, Cham,
          <year>2022</year>
          , pp.
          <fpage>463</fpage>
          -
          <lpage>469</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S.</given-names>
            <surname>Praharaj</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Schefel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Drachsler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Specht</surname>
          </string-name>
          ,
          <article-title>Literature review on co-located collaboration modeling using multimodal learning analytics-can we go the whole nine yards?</article-title>
          ,
          <source>IEEE Transactions on Learning Technologies</source>
          <volume>14</volume>
          (
          <year>2021</year>
          )
          <fpage>367</fpage>
          -
          <lpage>385</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>D.</given-names>
            <surname>Jonassen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Davidson</surname>
          </string-name>
          , M. Collins, J. Campbell,
          <string-name>
            <given-names>B. B.</given-names>
            <surname>Haag</surname>
          </string-name>
          ,
          <article-title>Constructivism and computer-mediated communication in distance education</article-title>
          ,
          <source>American journal of distance education 9</source>
          (
          <year>1995</year>
          )
          <fpage>7</fpage>
          -
          <lpage>26</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>W.</given-names>
            <surname>Doise</surname>
          </string-name>
          ,
          <article-title>On the social development of the intellect</article-title>
          ,
          <source>in: The Future of Piagetian Theory</source>
          ,
          <string-name>
            <surname>Springer</surname>
            <given-names>US</given-names>
          </string-name>
          ,
          <year>1985</year>
          , pp.
          <fpage>95</fpage>
          -
          <lpage>121</lpage>
          . URL: https://doi.org/10.1007/978-1-
          <fpage>4684</fpage>
          -4925-
          <issue>9</issue>
          _5.
          <source>doi:1 0 . 1 0</source>
          <volume>0 7 / 9 7 8 - 1 - 4 6 8 4 - 4 9 2 5 - 9</volume>
          _
          <fpage>5</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J.-W.</given-names>
            <surname>Strijbos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Weinberger</surname>
          </string-name>
          ,
          <article-title>Emerging and scripted roles in computer-supported collaborative learning</article-title>
          ,
          <source>Computers in Human Behavior</source>
          <volume>26</volume>
          (
          <year>2010</year>
          )
          <fpage>491</fpage>
          -
          <lpage>494</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>N. M. M.</given-names>
            <surname>Dowell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. M.</given-names>
            <surname>Nixon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. C.</given-names>
            <surname>Graesser</surname>
          </string-name>
          ,
          <article-title>Group communication analysis: A computational linguistics approach for detecting sociocognitive roles in multiparty interactions</article-title>
          ,
          <source>Behavior Research Methods</source>
          <volume>51</volume>
          (
          <year>2018</year>
          )
          <fpage>1007</fpage>
          -
          <lpage>1041</lpage>
          . URL: https://doi.org/10.3758/ s13428-018-1102-z.
          <source>doi:1 0 . 3 7 5 8 / s 1 3</source>
          <volume>4 2 8 - 0 1 8 - 1 1 0</volume>
          <fpage>2</fpage>
          -
          <lpage>z</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>L.</given-names>
            <surname>Menzel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gombert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Weidlich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Fink</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Frey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Drachsler</surname>
          </string-name>
          ,
          <article-title>Why you should give your students automatic process feedback on their collaboration: Evidence from a randomized experiment</article-title>
          , in: O.
          <string-name>
            <surname>Viberg</surname>
            ,
            <given-names>I. Jivet</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Muñoz-Merino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Perifanou</surname>
          </string-name>
          , T. Papathoma (Eds.),
          <source>Responsive and Sustainable Educational Futures</source>
          , Springer Nature Switzerland, Cham,
          <year>2023</year>
          , pp.
          <fpage>198</fpage>
          -
          <lpage>212</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>D.</given-names>
            <surname>Di Mitri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Schneider</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Specht</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Drachsler</surname>
          </string-name>
          ,
          <article-title>From signals to knowledge: A conceptual model for multimodal learning analytics</article-title>
          ,
          <source>Journal of Computer Assisted Learning</source>
          <volume>34</volume>
          (
          <year>2018</year>
          )
          <fpage>338</fpage>
          -
          <lpage>349</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>P.</given-names>
            <surname>Saint-Andre</surname>
          </string-name>
          ,
          <article-title>Extensible Messaging and Presence Protocol (XMPP): Core</article-title>
          , RFC
          <volume>6120</volume>
          ,
          <year>2011</year>
          . URL: https://www.rfc-editor.
          <source>org/info/rfc6120. doi:1 0 . 1 7</source>
          <volume>4 8</volume>
          <fpage>7</fpage>
          / R F C 6
          <volume>1 2 0 .</volume>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>H.</given-names>
            <surname>Bredin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Yin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Coria</surname>
          </string-name>
          , G. Gelly,
          <string-name>
            <given-names>P.</given-names>
            <surname>Korshunov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lavechin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Fustes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Titeux</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Bouaziz</surname>
          </string-name>
          , M.-P. Gill, pyannote.audio
          <article-title>: neural building blocks for speaker diarization</article-title>
          ,
          <source>in: ICASSP</source>
          <year>2020</year>
          , IEEE International Conference on Acoustics,
          <source>Speech, and Signal Processing</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>H.</given-names>
            <surname>Bredin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Laurent</surname>
          </string-name>
          ,
          <article-title>End-to-end speaker segmentation for overlap-aware resegmentation</article-title>
          ,
          <source>in: Proc. Interspeech</source>
          <year>2021</year>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>J. M. Coria</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Bredin</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Ghannay</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Rosset</surname>
          </string-name>
          ,
          <article-title>Overlap-aware low-latency online speaker diarization based on end-to-end local segmentation</article-title>
          ,
          <source>in: 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>1139</fpage>
          -
          <lpage>1146</lpage>
          .
          <source>doi:1 0 . 1 1 0 9 / A S R U 5 1</source>
          <volume>5 0 3 . 2 0 2 1 . 9 6 8 8 0 4 4 .</volume>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>P.</given-names>
            <surname>Sobot</surname>
          </string-name>
          , Pedalboard,
          <year>2021</year>
          . URL: https://doi.org/10.5281/zenodo.7817838.
          <source>doi:1 0 . 5 2 8 1 / z e n o d o . 7 8</source>
          <volume>1 7 8 3 8 .</volume>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>