<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Pre-conference Workshop), March</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Machine Learning for Multimodal Learning Analytics and Feedback</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Hiroaki Kawashima</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Hyogo</institution>
          ,
          <addr-line>Kobe, Hyogo 6512197</addr-line>
          ,
          <country country="JP">Japan</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2023</year>
      </pub-date>
      <volume>14</volume>
      <issue>2023</issue>
      <fpage>0000</fpage>
      <lpage>0003</lpage>
      <abstract>
        <p>Multimodal measurement of human learning enables a data-driven approach to learning analysis and feedback generation. This position paper discusses the possibility of feedback based on multimodal learning analytics from how machine learning methods can be applied. In particular, we first discuss how (1) behavioral measurements, such as learner browsing logs and eye tracking, and (2) content analysis of learning materials can lead to (3) prediction and modeling of learners' states (e.g., performance) and (4) feedback generation, such as information presentation and content optimization, through some research examples. We then show future research directions of machine learning-based learners' state modeling for feedback generation.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;learning analytics</kwd>
        <kwd>multimodal</kwd>
        <kwd>e-book log</kwd>
        <kwd>eye tracking</kwd>
        <kwd>content generation</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Various measurements of learning behaviors, such as clickstreams of e-book manipulation [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]
and gazing patterns at lecture videos [
        <xref ref-type="bibr" rid="ref2 ref3 ref4">2, 3, 4</xref>
        ], have been introduced to study individual-adaptive
learning in higher education. These multimodal observation data enable a precise data-driven
learning analysis, which had been based mainly on the instructor’s experience. On the other
hand, the advancement of machine learning enables detailed content analysis of text and image
data [
        <xref ref-type="bibr" rid="ref5 ref6">5, 6</xref>
        ]. By integrating (1) multimodal measurement of human learning behaviors and (2)
multimodal analysis of learning materials through fine-grained learning analytics, we are aiming
at (3) estimating learners’ states and (4) generating feedback to individuals adaptive to their
situations. In higher education, how to increase feedback frequency to students with a limited
number of teaching staf is an important issue. Data-driven or machine-driven feedback has the
potential to support various types of learning, not only for students but for the improvement
and optimization of teaching methods and learning materials on the teachers’ side.
      </p>
      <p>
        The concept of feedback loops with machine learning models has been discussed in the field
of multimodal learning analytics (MMLA) (e.g., [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]), where the analysis of multimodal learning
behavior is the main focus. This position paper extends the idea of feedback loops in learning
analytics, focusing on using content analysis (i.e., learning material) with behavioral data and
automatic content generation based on machine learning techniques. In Sec. 2, we discuss
      </p>
      <sec id="sec-1-1">
        <title>Learners Target</title>
        <p>learner
States: knowledge, attention,
interest, learning style, etc.</p>
        <p>Feedback</p>
        <p>Update
Feedback design for a
target learner
- Visualization
- Recommendation, etc.</p>
      </sec>
      <sec id="sec-1-2">
        <title>Direct feedback Learning materials</title>
        <p>Dashboard, chatbot, e-books, texts, videos,
e-portfolio, quiz, quizzes, etc.
recommendation, etc.
(2) Content</p>
        <p>analysis
e-Book
System</p>
        <p>Eye tracker /</p>
        <p>Web camera
(1) Behavioral
measurement</p>
      </sec>
      <sec id="sec-1-3">
        <title>Behavioral features</title>
        <p>- Page transition
- Comment / mark
- Gaze pattern, etc.
(4) Feedback / content optimization
Integrated analysis
(3) Modeling</p>
      </sec>
      <sec id="sec-1-4">
        <title>Content features</title>
        <p>- Semantic features
- Design features, etc.</p>
        <p>Machine learning</p>
      </sec>
      <sec id="sec-1-5">
        <title>Learner model</title>
        <p>- Performance prediction
- State estimation, etc.
technical components in our assumed feedback loop by limiting learning context to e-book
and video viewing-based learning situations. Section 3 shows some examples of performance
prediction models and slide-content generation, which have the potential to be used for feedback.
We then discuss an important research question for feedback generation “what is an appropriate
representation of learners’ state?” in Sec. 4.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Technical Components for the Feedback Loop</title>
      <p>
        The overall structure of the feedback loop envisioned in this paper is shown in Fig. 1. We
consider electronic text browsing logs (e.g., clickstreams) and eye-tracking data for the learners’
behavioral measurement in Fig. 1 (1). With the spread of learning management systems (LMSs)
and massive open online courses (MOOCs), learners’ behaviors, such as signing in/out and
assignment submission, are obtained by system logs. Besides, more detailed behaviors can now
be measured by e-book systems and online-course systems as operation logs of lecture materials
(e.g., page transitions, adding or deleting markers) [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and lecture videos (e.g., pause, rewind) [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
Furthermore, additional devices and software enable multimodal behavioral measurements, such
as eye-gaze tracking, which tell us fine-grained in-class activities of learners [
        <xref ref-type="bibr" rid="ref2 ref4">2, 4</xref>
        ]. This paper
assumes learning situations where such e-book systems or eye-gaze trackers are introduced.
      </p>
      <p>
        Meanwhile, the potential and application range of content analysis in Fig. 1 (2) are now
increasing because of the advancement of machine learning. In particular, its media-analysis
capability covers a wide range of contents, including textbooks, quiz questions, lecture slides,
videos, and audio. Neural networks such as recurrent neural networks (RNNs) and
Transformers [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] have recently made it possible to obtain detailed features of words, sentences, and
documents. The same applies to image features using convolutional neural networks (CNNs).
As a result, the features of multimodal content can be extracted in vector representations, which
can be used for various types of analysis, such as performance prediction and similarity analysis.
      </p>
      <p>Behavioral data captured in (1) and the content features obtained in (2) are then used for
learners’ modeling, such as performance prediction and the estimation of learners’ state (e.g.,
mental and cognitive states), using machine learning methods (Fig. 1 (3)). As many machine
learning models lack interpretability or explainability, we need to consider how explainable the
models should be, which depends on feedback objectives (see Sec. 4 for detailed discussion).</p>
      <p>
        For the feedback step in Fig. 1 (4), it is important to decide whom to target and when and what
to provide. For example, it is possible to provide feedback to teachers on which slides students
browse during class [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. Providing information on which slides or topics students struggled
with is also helpful in supporting their reflections after the class. Furthermore, personalized
summaries or converted content (e.g., videos and slides for easier comprehension) can be
generated by optimizing learning materials. Here, various types of feedback can be considered
through MMLA depending on how the components from (1) to (4) are used.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Examples of Learning Analytics toward Feedback Generation</title>
      <p>This section introduces research examples we are working on toward realizing feedback to
show some technical components described in Sec. 2 and to discuss how they can be combined.</p>
      <sec id="sec-3-1">
        <title>3.1. Performance Prediction through Behavioral and Content Analysis</title>
        <p>In order to predict each student’s performance, we can utilize both (a) what learning content
is used and (b) how each student behaves. We here hypothesize that combining the content
information from (a) and behavioral features from (b) will enable us to achieve more accurate
performance prediction compared to only using content-independent features (b).</p>
        <p>
          Browsing-log data on an e-book system. Some universities use e-book systems to obtain
the log data of students’ browsing behavior when manipulating textbooks or slides [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. Using
such browsing log data allows various data mining and analytics, such as the discovery of
behavioral changes before and after COVID-19 [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] and grade prediction based on machine
learning [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ], become possible. With such log data, we predict quiz scores by combining
information on slide content (e.g., text) with the features of students’ behavioral data (i.e.,
operation logs obtained from the e-book system). Here, we utilize Sentence-BERT [12] to obtain
the embedded vector representation of each slide page and take the weighted sum of the vectors
using the duration of page viewing as the weights. Then, the obtained vector is used as input for
a gradient-boosting model to predict scores. The results show that using such features increases
the accuracy of predicting the quiz scores, which suggests that content-dependent behavioral
features are informative in predicting each learner’s performance level.
        </p>
        <p>
          Eye tracking data during video viewing. Eye tracking is used for a finer-grained
measurement of learners’ behavior during viewing lecture videos. The gazing point series can
be used to estimate learners’ various states, including “mind wandering” [13] and “actively
examining slide materials” [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. In addition, as with browsing log data, gaze data may include
features related to students’ performance [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. Here, we incorporate attentional states estimated
from eye-gaze data for predicting students’ performance by exploiting a probabilistic model of
switching attentional states [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. The model automatically estimates sequences of attentional
states by assuming that each of the three gaze distributions, including time-dependent and
content-dependent distributions, corresponds to each attentional state. Our ongoing research
suggests that not only content-dependent gaze features, such as where the learner looks and
what is looked at, but the estimated attentional states contribute to quiz-score prediction.
        </p>
        <p>The prediction of students’ performance (e.g., final grades, comprehension of each topic) can
be used to generate various feedback for both students and teachers. Feedback forms include
visualization on dashboards that learners can check themselves, recommendation of learning
materials, interactive support (e.g., chatbots), and the detection of students who need help
from teachers. The implementation and verification of these feedback methods need to be
investigated in the future.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Automatic Content Generation and Emphasis</title>
        <p>
          Research on feature extraction and automatic text summarization using neural networks is
rapidly advancing in the natural language processing (NLP) community. Text feature extraction
methods, including word embedding (e.g., Word2Vec [14]) and Transformer encoders (e.g.,
BERT [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]) are widely used for text analysis exploiting vector representations that encode the
similarity of content.
        </p>
        <p>Lecture-video emphasis using audio. Those vector representations and similarity
measures are key to finding related text in learning materials. In our ongoing work, we utilize those
vector representations of text data to realize automatic spotlighting of video content. Once the
lecturer’s utterance is processed by text-to-speech, the similarity between slide text and speech
is computed, and the corresponding regions can be emphasized to guide learners’ attention.</p>
        <p>Slide generation from documents. Transformer-based automatic slide generation from
documents [15, 16] has recently attracted the attention of researchers in the machine learning
community. While the text layout in a slide is not considered in these studies, we try to
generate slides from a Wikipedia page by automatically selecting text layouts to improve the
slide readability. To select appropriate layouts, we train and utilize a BERT-based classifier for
estimating the discourse relationship of a given sentence pair.</p>
        <p>As our ongoing work on automatic content highlighting and generation are not individualized,
it needs to be combined with learners’ performance modeling described in Sec. 3.1. In addition,
not only the information of content but learners’ behavior data (Fig. 1 (1)) can also be used to
ifnd important topics and pages that should be emphasized [ 17]. Therefore, the integration of
(1), (2), and the learner model (3) is also an interesting challenge.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Machine Learning for Learners’ Modeling</title>
      <p>Machine learning techniques can be applied for various objectives, from predicting students’
performance to generating content, in the context of learning analytics loop as described in
Sec. 2 and Sec. 3. However, it has not yet been elucidated on “what kind of learners’ information
is required to provide appropriate feedback and how machine learning models can estimate such
information.” Machine learning could contribute to learners’ state modeling in the following
three levels: feature level, manually designed level, and automatically extracted level. In this
section, we discuss the above questions regarding the learners’ state representation and present
several challenges.</p>
      <p>Feature-level representation. The most straightforward representation of a learner’s state
is the amount of activity on learning materials or topics, such as the frequency the learner has
viewed slide pages, texts, figures, and the count of correctly answering topic-related questions.
Once the similarity among learning materials or topics is extracted automatically using machine
learning techniques (e.g., Transformer encoders described in Sec. 3.2), the similarity structure
of learning content provides essential information to infer learners’ knowledge in detail in
behavioral and content-related feature space. This leads to the generation of meaningful
feedback, including recommendations and optimization of learning content. Automatic or
semi-automatic knowledge extraction from learning materials (e.g., [18]) may also facilitate this
process.</p>
      <p>
        Manually designed state representation. The second direction is to manually design a
learner’s state (e.g., emotion, attention, cognition, knowledge levels, skills, key competencies).
While these variables cannot be directly observed from behavioral data, it is possible to estimate
them using machine learning models once the model is trained using annotated datasets [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
The results of quizzes, tests, and exams are also used as the training data for the model. The
performance prediction studies introduced in Sec. 3.1 are examples of this direction, which infer
the state of learners from their behaviors even when they do not take exams. Note that the
recent trend of machine learning enables integrating diferent modalities, such as multimodal
behavioral data and learning materials, using embedded vector representations extracted from
various encoder models. The estimated knowledge states can be used to generate a variety of
feedback, such as learning material recommendations, as described in Sec. 3.1.
      </p>
      <p>Automatically extracted state representation from data. Machine learning could
contribute to extracting learners’ states automatically as a latent representation in a model through
behavior analysis. This direction corresponds to the “end-to-end modeling of the learner’s
internal state” described in Sec. 2. For example, the techniques of Knowledge Tracing [19],
which model the knowledge state of individual learners from a series of questions and their
answers, have been rapidly advanced with machine learning models. While it also predicts
learners’ performance, similar to Sec. 3.1, a learner’s state is obtained as a latent variable through
end-to-end model training. This opens up the possibility of finding a useful representation of
learners’ states from the data without annotation, although there are dificulties in ensuring
explanatory and interpretability. To make the latent variable in the end-to-end model more
interpretable, we can further impose the model with prior or external knowledge as the design
of model structures, parameter constraints, and regularizers. Graph neural networks [20] are
examples of such model structures that would allow the smooth integration of knowledge in
learning theory.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>This position paper introduced how the recent machine learning techniques can be applied to
the components of MMLA-based feedback loops, focusing on integrated content analysis with
behavioral data and content generation. In particular, we discussed that the recent trend of
representation learning would open up new possibilities for integrating multimodal behavioral
and contextual data. To estimate learners’ states in deep and generate appropriate and detailed
feedback, the field of learning analytics and machine learning can collaborate in many aspects,
and this leads to a new framework of feedback loops in MMLA.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>This work was supported by JSPS KAKENHI Grant Number JP19H04226.
Warning System for Spotting at-Risk Students by Using eBook Interaction Logs, Smart
Learning Environments 6 (2019).
[12] N. Reimers, I. Gurevych, Sentence-BERT: Sentence Embeddings using Siamese
BERT</p>
      <p>Networks (2019). arXiv:1908.10084.
[13] S. Hutt, J. Hardey, R. Bixler, A. Stewart, E. Risko, S. K. D. Mello, Gaze-based Detection of
Mind Wandering during Lecture Viewing, International Conference on Educational Data
Mining (2017) 226–231.
[14] T. Mikolov, K. Chen, G. Corrado, J. Dean, Eficient Estimation of Word Representations in</p>
      <p>Vector Space (2013). arXiv:1301.3781.
[15] E. Sun, Y. Hou, D. Wang, Y. Zhang, N. X. R. Wang, D2S: Document-to-Slide
Generation Via Query-Based Text Summarization (2021) 1405–1418. doi:10.18653/v1/2021.
naacl-main.111. arXiv:2105.03664.
[16] T.-J. Fu, W. Y. Wang, D. McDuf, Y. Song, DOC2PPT: Automatic Presentation Slides
Generation from Scientific Documents, AAAI Conf. on Artificial Intelligence 36 (2022)
634–642.
[17] A. Shimada, F. Okubo, C. Yin, H. Ogata, Automatic Summarization of Lecture Slides
for Enhanced Student Preview-Technical Report and User Study, IEEE Transactions on
Learning Technologies 11 (2018) 165–178.
[18] A. Fiallos, X. Ochoa, Semi-Automatic Generation of Intelligent Curricula to Facilitate
Learning Analytics, International Conference on Learning Analytics &amp; Knowledge (LAK)
(2019) 46–50.
[19] C. Piech, J. Spencer, J. Huang, S. Ganguli, M. Sahami, L. Guibas, J. Sohl-Dickstein, Deep</p>
      <p>Knowledge Tracing (2015) 1–12. arXiv:1506.05908.
[20] H. Nakagawa, Y. Iwasawa, Y. Matsuo, Graph-based Knowledge Tracing: Modeling
Student Proficiency Using Graph Neural Network, International Conference on Learning
Representations (2019).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>H.</given-names>
            <surname>Ogata</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Oi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Mohri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Okubo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Shimada</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Yamada</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Hirokawa</surname>
          </string-name>
          ,
          <article-title>Learning Analytics for E-Book-Based Educational Big Data in Higher Education, Smart Sensors at the IoT Frontier</article-title>
          , Springer, Cham (
          <year>2017</year>
          )
          <fpage>327</fpage>
          -
          <lpage>350</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>K.</given-names>
            <surname>Sharma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Jermann</surname>
          </string-name>
          , P. Dillenbourg, “
          <article-title>With-me-ness”: A Gaze-Measure for Students' Attention in MOOCs</article-title>
          ,
          <source>International Conference of the Learning Sciences (ICLS)</source>
          (
          <year>2014</year>
          )
          <fpage>1017</fpage>
          -
          <lpage>1022</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. J.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. T.</given-names>
            <surname>Seaton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Mitros</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. Z.</given-names>
            <surname>Gajos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. C.</given-names>
            <surname>Miller</surname>
          </string-name>
          ,
          <source>Understanding In-Video Dropouts and Interaction Peaks in Online Lecture Videos, International Conference on Learning @ Scale</source>
          (
          <year>2014</year>
          )
          <fpage>31</fpage>
          -
          <lpage>40</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>H.</given-names>
            <surname>Kawashima</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Ueki</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Shimonishi</surname>
          </string-name>
          ,
          <article-title>Modeling Video Viewing Styles with Probabilistic Mode Switching</article-title>
          , International Conference on Computers in
          <source>Education (ICCE)</source>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          , BERT:
          <article-title>Pre-training of Deep Bidirectional Transformers for Language Understanding (</article-title>
          <year>2019</year>
          ). arXiv:
          <year>1810</year>
          .04805.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>K.</given-names>
            <surname>Simonyan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Zisserman</surname>
          </string-name>
          ,
          <article-title>Very Deep Convolutional Networks for Large-Scale Image Recognition</article-title>
          ,
          <source>International Conference on Learning Representations (ICLR)</source>
          (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>D.</given-names>
            <surname>Di Mitri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Schneider</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Specht</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Drachsler</surname>
          </string-name>
          ,
          <article-title>From Signals to Knowledge: A Conceptual Model for Multimodal Learning Analytics</article-title>
          ,
          <source>Journal of Computer Assisted Learning</source>
          <volume>34</volume>
          (
          <year>2018</year>
          )
          <fpage>338</fpage>
          -
          <lpage>349</lpage>
          . doi:
          <volume>10</volume>
          .1111/jcal.12288.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A.</given-names>
            <surname>Vaswani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Shazeer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Parmar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Uszkoreit</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. N.</given-names>
            <surname>Gomez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Kaiser</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Polosukhin</surname>
          </string-name>
          , Attention Is All You Need (
          <year>2017</year>
          ). arXiv:
          <volume>1706</volume>
          .
          <fpage>03762</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>A.</given-names>
            <surname>Shimada</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Konomi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Ogata</surname>
          </string-name>
          ,
          <article-title>Real-Time Learning Analytics System for Improvement of on-Site Lectures</article-title>
          ,
          <source>Interactive Technology and Smart Education</source>
          <volume>15</volume>
          (
          <year>2018</year>
          )
          <fpage>314</fpage>
          -
          <lpage>331</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>H.</given-names>
            <surname>Kawashima</surname>
          </string-name>
          ,
          <article-title>Comparison of Learning Behaviors on an e-Book System in 2019 Onsite and 2020 Online Courses</article-title>
          ,
          <source>International Conference on Educational Data Mining</source>
          (
          <year>2022</year>
          )
          <fpage>753</fpage>
          -
          <lpage>757</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>G.</given-names>
            <surname>Akçapınar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. N.</given-names>
            <surname>Hasnine</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Majumdar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Flanagan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Ogata</surname>
          </string-name>
          ,
          <string-name>
            <surname>Developing an</surname>
          </string-name>
          Early-
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>