Comparing supervised machine learning approaches to
                         automatically code learning designs in mobile learning

                           Gerti Pishtari1, Luis P. Prieto1 Maríaa Jesús Rodríguez-Triana1 and Roberto
                                                        Martinez-Maldonado2
                                     1Tallinn University, Narva maantee 25, 10120 Tallinn, Estonia

                                            [gpishtar, lprisan, mjrt]@tlu.ee

                                   2Monash University, Wellington Rd, Clayton VIC 3800, Australia

                                          Roberto.MartinezMaldonado@monash.edu


                            Abstract. To understand and support teachers’ design practices, re-
                            searchers in Learning Design manually analyse small sets of design ar-
                            tifacts produced by teachers. This demands substantial manual work and
                            provides a narrow view of the community of teachers behind the designs.
                            This paper compares the performance of different Supervised Machine
                            Learning (SML) approaches to automatically code datasets of learning
                            designs. For this purpose, we extracted a subset of learning designs (i.e.,
                            their textual content) from Avastusrada and Smartzoos, two mobile
                            learning tools. Later, we manually coded it guided by rel-evant theoretical
                            models to the context of mobile learning and used it to train and compare
                            several combinations of SML models and feature extraction techniques.
                            Results show that such models can reliably code learning design datasets
                            and could be used to understand the learning design practices of large
                            communities of teachers in mobile learning and beyond.

                            Keywords: Supervised Machine Learning, Learning Design, Learning
                            Analytics, Mobile Learning, Contextual Learning


                     1    Mobile Learning from a Learning Design perspective
                     Mobile Learning (m-learning) activities promote authentic and contextualized
                     learning [18, 12]. These activities usually take place across spaces (physical and
                     digital) and settings (formal, informal, or non-formal) [9, 11]. To enable teach-
                     ers to design for m-learning, the field of Learning Design (LD) has come up
                     with several authoring tools [13]. For instance, Smartzoos support the design of
                     geo-localised learning activities outdoor [14], while with GLUESP-AR teachers
                     design activities that happen across multiple physical and digital spaces [9].
                         Designing learning activities is already a strenuous task for teachers. In m-
                     learning they also have to deal with the complexity of designing across settings
                     and spaces (previously discussed), together with the need to possess substan-
                     tial technical and pedagogical competencies, relevant to this context. Mettis


Copyright © 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0)
                                  Learning Analytics in times of COVID-19: Opportunity from crisis              53


                     and Väljataga [8] after manually analysing designs that teachers created in an
                     m-learning training, concluded that most of the designs were decontextualized
                     (i.e., not related with the situated learning environment) and scored low on the
                     cognitive level (i.e., that mainly required from students to remember basic con-
                     cepts, instead of performing analysis or evaluations). Considering that teachers
                     should have been trained to produce adequate technology-enhanced designs (in-
                     cluding m-learning ones) since their pre-service education [8], more research is
                     needed to first understand and then support teachers’ practices when designing
                     for m-learning. A first step could be analysing of databases of design artifacts
                     from existing m-learning tools.
                         To address this gap, researchers would have to analyse large communities of
                     teachers that design for m-learning. Existing studies have already automatically
                     analysed learning designs practices, focusing on (teachers, or students) action
                     logs, or the structure of the designs (e.g., [3]). Nevertheless, when researchers
                     want to consider more high-level aspects (e.g., the pedagogical approaches fol-
                     lowed by teachers), the typical approach has been to manually code the designs
                     (see, for instance [16]). For large datasets, it would be necessary an automatic
                     coding strategy, as it is time consuming to follow a manual approach. Therefore,
                     in this paper we compare different supervised machine learning (SML) models
                     and features extraction techniques to automatically code datasets of learning de-
                     signs for m-learning.
                         We started by compiling a dataset with learning designs from two m-learning
                     platforms, Avastusrada (avastusrada.ee) and Smartzoos (smartzoos.eu). As a
                     first step, we considered as input features for the algorithms only the textual
                     content (in Estonian) of the learning tasks included in the designs. Although,
                     the design artifacts in these tools also include other metadata that could be
                     potentially used as features for the SML algorithms (such as different types
                     of learning tasks and learning resources), these usually are tool-dependent and
                     would not be useful for platform-independent algorithms that can be later used
                     to analyse learning designs from multiple tools.
                         We manually labelled the dataset guided by theoretical models and tax-
                     onomies, relevant to the context of m-learning and also used in previous stud-
                     ies that manually labeled m-learning deisngs [8, 18]. These include the Revised
                     Bloom’s Taxonomy [7], the Inquiry Based Learning (IBL) model [10], and the
                     categorization of the role of the context in a learning activity [18]. This dataset
                     (with the textual content as input and the corresponding codes as the output
                     that had to be predicted) was later used to train and compare the different SML
                     models and feature extraction techniques (see section 3).


                     2    Machine Learning as analytics for LD in m-learning

                     Research in Learning Analytics (LA) has largely used SML to predict learners’
                     performance [1, 21]. Furthermore, Prieto et al. [15] attempted to use SML to sup-
                     port researchers, by automatically coding diaries of students’ learning progress.
                     Yet, the automated analysis of artifacts created by teachers remains an underex-


Copyright © 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0)
                   54             Learning Analytics in times of COVID-19: Opportunity from crisis


                     plored area. Therefore, this paper presents a comparison of the performance of
                     different SML approaches, when trained to code datasets of m-learning designs,
                     guided by theoretical models that are pertinent in the context of m-learning (see
                     section 3).
                         Analytics can inform LD practices in different levels: as LA (i.e., informed
                     based on students data), as design analytics (i.e., informed from traces of the
                     LD process), or as community analytics (such as metrics about LD practices of a
                     community of teachers behind a specific m-learning tools) [6]. Few studies reflect
                     this alignment between LD and LA in m-learning [11]. Cases that explicitly
                     addressed this alignment, focus mainly on LA for LD [9, 17], while design and
                     community analytics for LD remain unexplored. This paper aims to explore the
                     potential of SML techniques to automatically code learning designs. Successful
                     algorithms could be later used to analyse large databases of designs from multiple
                     tools, as well as to create systems that provide design and community analytics
                     in m-learning.


                     3    Methodology

                     This study is guided by the following research question (RQ): To what extent
                     can SML techniques automatically code datasets of m-learning designs, in terms
                     of IBL phases, context and cognitive level? To respond this question, we con-
                     ducted an exploratory study that consists of two parts. During the first part we
                     compiled a dataset of learning tasks (i.e. their textual content), extracted from
                     existing learning designs in Avastusrada and Smartzoos, which are used by two
                     complementary communities of teachers. Avastusrada is used in formal settings
                     (by K-12 schools in Estonia), while Smartzoos in informal, or non-formal ones
                     (used by zoos in Estonia, Sweden and Finland). The dataset had 1,472 learning
                     tasks in Estonian, originating from 168 different designs (114 from Avastusrada
                     and 54 from Smartzoos).
                         To determine the cognitive level (required from learners) in each learning
                     task, we coded this dataset using a binary version of the Revised Bloom’s tax-
                     onomy [7], consisting of: lower-order thinking, representing the two lowest cate-
                     gories (Remember and Understand); and higher-order thinking representing the
                     rest (Apply, Analyse, Evaluate, Create). This choice was made to identify tasks
                     that require students to (at least) apply their knowledge in different learning
                     situations from tasks that did not (a relevant aspect of Avastusrada and Smart-
                     zoos). Furthermore, to understand the role played by the situated environment
                     in the learning designs, we coded each task based on the following categories (in-
                     spired by [18]):learning in context, i.e., learning happening in a specific situated
                     learning environment; learning about context, when the situated environment it-
                     self is the object of learning. Finally, to understand the extent to which IBL
                     pedagogies (relevant to the context of Avastusrada and Smartzoos) were present
                     in the learning designs, we used the following phases of the IBL model proposed
                     by [10]: Conceptualization, during which learners have to come up with a hypoth-
                     esis, or problem; Investigation that include activities such as experimentation


Copyright © 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0)
                                  Learning Analytics in times of COVID-19: Opportunity from crisis              55


                     and data interpretation; and Conclusion during which learners reflect upon the
                     results and their implications.
                         Guided by these theoretical models, we coded the dataset using 6 binary
                     codes that signaled if a learning task included: higher-order thinking, learning in
                     context, learning about context, conceptualization, investigation, and conclusion.
                     As Bloom categories are hierarchical, they are represented by a single code. For
                     the rest we use a separate code for each category (e.g., a task can have more
                     than one phase of IBL). The dataset was coded by two master students from the
                     school of Digital Technologies, Tallinn University. We first conducted a test where
                     each coder worked with the same subset of 100 tasks and compared the results to
                     establish a common coding approach. The same procedure was repeated until the
                     end, during which cases doubtful cases were consulted with the first author of this
                     paper (see the full coded dataset in bit.ly/ManuallyLabelledDatasetJLA2021).
                     During the second part of this study we used the dataset to train, evaluate
                     and compare several common SML models and feature extraction techniques for
                     natural language processing (for each of binary code in the dataset). We first
                     preprocessed the textual content (see Figure 1, in green).


                                  Fig. 1. The process of comparing different SML approaches.


                         Using 80% of the dataset as training and 20% as testing set, we tested a com-
                     bination of classic SML models and neural networks with different feature extrac-
                     tors. The first group consisted of classic models, i.e., Logistic Regression (LR),
                     Naive Bayes (NB), Random Forest (RF), Support Vector Machine (SVM) with
                     a linear kernel and Gaussian Processes (GP), combined with feature extractors


Copyright © 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0)
                   56             Learning Analytics in times of COVID-19: Opportunity from crisis


                     such as the pre-trained word2vec in Estonian with 100 embedding dimensions,
                     both as a continuous bag of words (W2V CBOW) and as a skip-gram (W2V SG),
                     bag of words (BOW), and bag of words with term-frequency inverse-document-
                     frequency (TF-IDF BOW). Neural networks included Long Short-Term Memory
                     Recurrents (LSTM), Convolutionals (CNN) and a mixed model (CNN+LSTM).
                     These were tested in combination with the word2vec mentioned above, and an
                     untrained embedding layer. LSTM consisted of a single bidirectional layer, while
                     CNN was a 1-dimensional layer, both with 64 hidden units. We used early stop-
                     ping based on the validation loss to avoid overfitting. Finally, we also used the
                     Estonian version of the Bidirectional Encoder Representations from Transform-
                     ers (EstBERT) [19], with an AdamW optimizer with 2e-0.5 as the initial learning
                     rate and a single layer. The process was a stratified 5-fold cross-validation, re-
                     peated 5 times, based on various classification metrics, used for the comparison
                     (see Figure 1 below). Algorithms with kappa values (the inter-rater reliability
                     between the manual and automatic process) lower than 0.65 were not consid-
                     ered as reliable [20]. Algorithms were written in Python, using Scikit-learn and
                     Tensorflow packages.


                     4    Results

                     This section presents a comparison of the combination of SML models and
                     feature extractors, guided by Cohen’s kappa. The attached document includes
                     results for all the metrics (bit.ly/ResultsStep2JLA2021). In Figure 2, we can
                     see that classic models did not surpass the threshold value for kappa>0.65.
                     Neural networks performed better, but only EstBERT significantly surpassed
                     kappa>0.65. The prevalence, which considers the balance of the dataset for
                     each code (see the horizontal line in 2, where the right side represents balanced
                     datasets), had a direct influence over the performance of the classic models, but
                     had no significant influence over the performance of the neural networks.


                     Fig. 2. Distribution of Cohen’s kappa between human coders and the different combi-
                     nations of SML models and feature extractors, for the six the codes.


Copyright © 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0)
                                  Learning Analytics in times of COVID-19: Opportunity from crisis              57


                     Fig. 3. Variation of the reliability for all the models with the logarithmic prevalence
                     of each code.


                     5    Discussion
                     Regarding our RQ (the performance of SML approaches when coding datasets of
                     m-learning designs), we were able to train algorithms based on EstBERT that
                     for our particular dataset, were reliable on all the six codes (with kappa>0.65).
                     EstBERT algorithms also performed uniformly well on all the other classifica-
                     tion metrics that we used. Thus, SML could be used in the future to support
                     researchers in LD, when analysing large datasets of learning designs. In the
                     context of m-learning, similar algorithms could be used to analyse the whole
                     databases of Avastusrada and Smartzoos, providing a case of community ana-
                     lytics in m-learning [6], as well as enabling large-scale and in-the-wild studies
                     about the open issue of teachers’ design practices in m-learning [8]. Other m-
                     learning tools could benefit from the same SML approach, such as GLUESP-AR
                     [9], or QuestInSitu [17]. Beyond m-learning, our approach could be useful to
                     analyse LD platforms used by big communities of teachers, such as ILDE [5].
                         Most of the codes in our dataset did not have a balanced distribution (see
                     Figure 3), which is typical in qualitative coding tasks. However, EstBERT algo-
                     rithms performed well with all the codes, despite their prevalence and constitute
                     an example of dealing with unbalanced datasets (common in education). The
                     dataset used to train and compare the SML approaches constitutes a limita-
                     tion for this study as it is not a representative of all the kinds of designs in
                     m-learning. Also, the manual coding process might have produced biases condi-
                     tioning the performance of the algorithms. Nevertheless, while in this paper we
                     present only preliminary results for our exploratory study, further optimizing the
                     models could produce better performance results. We considered as a threshold
                     value kappa>0.65. However, various researchers advocate for different threshold
                     values, or for the inclusion of other metrics (e.g., Shaffer’s rho [4]).
                         We used as input features the textual content of the learning tasks. In future
                     work, features such as task type might improve the prediction for tool-specific


Copyright © 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0)
                   58              Learning Analytics in times of COVID-19: Opportunity from crisis


                     analysis. The design artifacts were in Estonian, a contribution, as few SML
                     algorithms exist in this language, but also a limitation, as English versions of
                     word2vec, BERT, etc., are usually pre-trained based on larger amounts of data.

                     6      Conclusion
                     In this study, we provide an example of how SML approaches can mimic humans,
                     in the context of coding datasets of m-learning designs. We compared different
                     SML models and feature extraction techniques. Models based on EstBERT con-
                     stantly provided values of kappa>0.65, thus could be used to conduct in-the-wild
                     studies of how teachers design for m-learning.
                         Future work will include further steps of optimization for all the models that
                     were considered in this study. In line with recent trends of providing models that
                     are transparent to the related stakeholders [2], it is important to further tune-up
                     the performance of classic models (such as LR) and compare it with black-box
                     ones (such as the neural networks). Once further optimized, the best performing
                     algorithms will be used to analyse the learning designs included in Avastusrada
                     and Smartzoos. A similar approach might be useful to analyse other known LD
                     tools in m-learning (e.g., QuestInSitu [17], or beyond (e.g., ILDE, [5]).

                     Acknowledgements
                     This research has been partially funded by the European Union in the context of
                     CEITER (Horizon 2020 Research and Innovation Programme, grant agreement
                     no. 669074). The authors would like to thank the coders for their contribution to
                     this study. Roberto Martinez-Maldonado’s research is partly funded by Jacobs
                     Foundation.

                     References
                        1. Chen, F., Cui, Y.: Utilizing student time series behaviour in learning management
                           systems for early prediction of course performance. Journal of Learning Analytics
                           7(2) (2020) 1–17
                        2. Conati, C., Porayska-Pomsta, K., Mavrikis, M.: Ai in education needs inter-
                           pretable machine learning: Lessons from open learner modelling. arXiv preprint
                           arXiv:1807.00154 (2018)
                        3. de Jong, T., Gillet, D., Rodrı́guez-Triana, M.J., Hovardas, T., Dikke, D., Doran,
                           R., Dziabenko, O., Koslowsky, J., Korventausta, M., Law, E., et al.: Understanding
                           teacher design practices for digital inquiry–based science learning: the case of go-
                           lab. Educational Technology Research and Development (2021) 1–28
                        4. Eagan, B.R., Rogers, B., Serlin, R., Ruis, A.R., Arastoopour Irgens, G., Shaffer,
                           D.W.: Can we rely on irr? testing the assumptions of inter-rater reliability. In:
                           International Conference on Computer Supported Collaborative Learning. (2017)
                        5. Hernández-Leo, D., Asensio-Pérez, J.I., Derntl, M., Prieto, L.P., Chacón, J.: Ilde:
                           Community environment for conceptualizing, authoring and deploying learning
                           activities. In: European conference on technology enhanced learning, Springer
                           (2014) 490–493


Copyright © 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0)
                                  Learning Analytics in times of COVID-19: Opportunity from crisis              59


                      6. Hernández-Leo, D., Martinez-Maldonado, R., Pardo, A., Muñoz-Cristóbal, J.A.,
                         Rodrı́guez-Triana, M.J.: Analytics for learning design: A layered framework and
                         tools. British Journal of Educational Technology 50(1) (2019) 139–152
                      7. Krathwohl, D.R.: A revision of bloom’s taxonomy: An overview. Theory into
                         practice 41(4) (2002) 212–218
                      8. Mettis, K., Väljataga, T.: Designing learning experiences for outdoor hybrid learn-
                         ing spaces. British Journal of Educational Technology 52(1) (2021) 498–513
                      9. Muñoz-Cristóbal, J.A., Rodrı́guez-Triana, M.J., Gallego-Lema, V., Arribas-
                         Cubero, H.F., Asensio-Pérez, J.I., Martı́nez-Monés, A.: Monitoring for aware-
                         ness and reflection in ubiquitous learning environments. International Journal of
                         Human–Computer Interaction 34(2) (2018) 146–165
                     10. Pedaste, M., Mäeots, M., Siiman, L.A., De Jong, T., Van Riesen, S.A., Kamp, E.T.,
                         Manoli, C.C., Zacharia, Z.C., Tsourlidaki, E.: Phases of inquiry-based learning:
                         Definitions and the inquiry cycle. Educational research review 14 (2015) 47–61
                     11. Pishtari, G., Rodrı́guez-Triana, M.J., Sarmiento-Márquez, E.M., Pérez-Sanagustı́n,
                         M., Ruiz-Calleja, A., Santos, P., P. Prieto, L., Serrano-Iglesias, S., Väljataga, T.:
                         Learning design and learning analytics in mobile and ubiquitous learning: A sys-
                         tematic review. British Journal of Educational Technology 51(4) (2020) 1078–1100
                     12. Pishtari, G., Rodrı́guez-Triana, M.J., Väljataga, T.: A multi-stakeholder perspec-
                         tive of analytics for learning design in location-based learning. International Jour-
                         nal of Mobile and Blended Learning (IJMBL) 13(1) (2021) 1–17
                     13. Pishtari, G., Rodrı́guez-Triana, M.J.: An analysis of mobile learning tools in terms
                         of pedagogical affordances and support to the learning activity lifecycle. In Gil,
                         E., Mor, Y., Dimitriadis, Y., Köppe, C., eds.: Hybrid Learning Spaces. Springer
                         (2021)
                     14. Pishtari, G., Väljataga, T., Tammets, P., Savitski, P., Rodrı́guez-Triana, M.J., Ley,
                         T.: Smartzoos: modular open educational resources for location-based games. In:
                         European Conference on Technology Enhanced Learning, Springer (2017) 513–516
                     15. Prieto, L.P., Pishtari, G., Rodrı́guez-Triana, M.J., Eagan, B.: Comparing natural
                         language processing approaches to scale up the automated coding of diaries in
                         single-case learning analytics. In: Second International Conference on Quantitative
                         Ethnography: Conference Proceedings Supplement. (2021) 39
                     16. Rodrı́guez-Triana, M.J., Prieto, L.P., Pishtari, G.: What do learning designs show
                         aboutpedagogical adoption? an analysis approachand a case study on inquiry-based
                         learning. (In Press)
                     17. Santos, P., Pérez-Sanagustı́n, M., Hernández-Leo, D., Blat, J.: Questinsitu: From
                         tests to routes for assessment in situ activities. Computers & Education 57(4)
                         (2011) 2517–2534
                     18. Sharples, M.: Making sense of context for mobile learning. Mobile learning: The
                         next generation (2016) 140–153
                     19. Tanvir, H., Kittask, C., Sirts, K.: Estbert: A pretrained language-specific bert for
                         estonian. arXiv preprint arXiv:2011.04784 (2020)
                     20. Viera, A.J., Garrett, J.M., et al.: Understanding interobserver agreement: the
                         kappa statistic. Fam med 37(5) (2005) 360–363
                     21. Xu, X., Wang, J., Peng, H., Wu, R.: Prediction of academic performance associated
                         with internet usage behaviors using machine learning algorithms. Computers in
                         Human Behavior 98 (2019) 166–173


Copyright © 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0)