From training KPIs to learning KPIs: ensuring
                         effectiveness in learning processes through predictive
                         analytics and data-based tutoring actions
                         Daniela Pellegrini1,† , Mario Santoro2,*,† and Sara Zuzzi1,†
                         1
                             Research and Development Division, Piazza Copernico s.r.l., Via Francesco Gentile, 135, 00173, Roma, Italy
                         2
                             Istituto per le Applicazioni del Calcolo "Mauro Picone" - Consiglio Nazionale delle Ricerche, via dei Taurini 19, 00185, Roma, Italy


                                         Abstract
                                         This work presents the analysis model of the study data available in the LMS platforms specifically designed to
                                         analyze potential critical issues as a functional indicator for the possible achievement of the training objectives and
                                         completion of the course. The illustrated system highlights how the use of statistical indicators and predictability
                                         can be an effective tool for the early identification of possible critical issues in the field of training results, as well
                                         as design and organizational inconsistencies that can weigh on the effectiveness of the training system made
                                         available. Our work explains how adopting a data analysis model applied to training environments provides the
                                         tutoring system with adequate information on potential critical issues to favor targeted interventions on the
                                         participants to prevent risks of training ineffectiveness. At the same time, it analyzes the global quality of the
                                         courses made available through a perspective of data exploration that starts from the learning experience and
                                         enhances the data already present in the LMS platforms.

                                         Keywords
                                         Learning KPI, Criticial Issues, Course Quality


                         1. Introduction
                         Evaluating learning experiences and identifying areas for improvement are crucial steps in ensuring
                         effective digital training programs. To this end, our research has developed a comprehensive model for
                         analyzing tracking data from the perspectives of digital learning operators, synthesizing all aspects
                         of training into a unified tool. This approach enables us to create a Macro Index of Performance
                         (MIP) and seven sub-indexes: Results, Study Pace, Course Structure, Computer Adequacy, Community
                         Participation, Educational Tutoring, and Process Tutoring. Using feedback from our first experiments,
                         we have oriented our subsequent research toward analyzing critical situations in tracking data. Our
                         novel framework, LearnalizeR, combines the MIP and sub-indexes to effectively evaluate online courses,
                         identify areas of riskiness, and provide tutors with potential preventive intervention strategies. This
                         study investigates the effectiveness of LearnalizeR through two real-world cases from an Italian public
                         administration, highlighting compelling examples of criticalities and proposed interventions. We aim
                         to demonstrate how LearnalizeR can help address participation and performance criticalities in online
                         courses, ultimately contributing to more effective digital training programs. This paper is structured as
                         follows: after a brief overview of the traditional use of KPIs in evaluating training effectiveness and
                         criticalities, we present the LearnalizeR framework and its components, including MIP and sub-indexes.
                         We then describe the application of LearnalizeR in two real-world cases, highlighting compelling
                         examples of criticalities and proposed interventions. Finally, we summarize our findings and discuss
                         implications for future research.

                         AIxEDU - 2nd International Workshop on Artificial Intelligence Systems in Education, November 25–28, 204, Bolzano, Italy
                         *
                           Corresponding author.
                         †
                           These authors contributed equally.
                         $ dpellegrini@pcopernico.it (D. Pellegrini); m.santoro@iac.cnr.it (M. Santoro); szuzzi@pcopernico.it (S. Zuzzi)
                          https://www.semanticase.it/ (D. Pellegrini); https://baltig.cnr.it/users/mario-santoro/projects (M. Santoro);
                         https://www.semanticase.it/ (S. Zuzzi)
                          0000-0001-6626-9430 (M. Santoro)
                                        © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
2. Theoretical Background
The big data available on the corporate LMS platforms, from training tracking data to enrollment
processes and evaluation systems, represents excellent information potential for continuously improving
distance learning projects. The training data, however, are often treated from the point of view of training
with a significant focus on hours provided, people involved, test thresholds reached, performance levels,
and dropout reduction. This focus on training, i.e., on the training offer, represents the point of view of
the training institute or company, which indeed constitutes a component of the problem of training
effectiveness and knowledge transfer. Correctly, training institutes and companies must be able to
evaluate and monitor the training system offered to the participants. However, extensive debates [1]
have arisen on how to effectively evaluate training and its effectiveness, from the issues of the ROI of
training to the most recent concept of training impact and the modern KPIs. The latter is helpful for
those who organize courses to design new content or methodologies or to redefine support models for
training activities. In that case, enhancing these improvements based on the users’ actual behavior is
equally important. It is essential to adopt a learning perspective, that is, to use data analysis to re-read
the experiences and behaviors carried out to redesign the training, starting from the actual impacts and
experiences of the participants.
   Our perspective on data-based decision-making changes significantly when we consider that choices
are derived from natural phenomena rather than just good intuitions or professional expertise. This
shift in perspective means that our options for redesign and tutoring intervention become user-oriented
and needs-oriented. As such, these interventions aim to validate the effectiveness of designed content
by considering real-world experiences. Ultimately, this approach enriches didactic and management
options by harmonizing them with typical study behaviors.

2.1. The use of data science in training
To respond to the need of a course evaluation and free learning process from standardized formalisms,
practices, and customers, one must equip oneself with data analysis tools and intervention strategies. As
part of the LearnalyzeR experience [2], we developed a criticality detecting system that aims to support
the intervention of tutors day by day through predictive systems, analysis of participation behaviors,
and redesign of courses. As a first step, we defined criticality as the behavior of a non-neglecting part
of participants that leads to a course experience that does not satisfy their fruition, participation, times,
and performance needs. We oriented the survey tools toward the recognition of:

    • Criticalities from already known patterns using only course data. They are calculated during the
      course, and the course is already critical when observed.
    • Predictive patterns of criticality with respect to historical behavior. They are calculated during
      the course, and the course is not critical when observed.
    • Distorted use of times and verification of units at risk. It can be calculated only at the end of each
      unit, and then, it can be summarized only at the end of the course.

   We will describe our model in details in section 3.
   Learning to understand the potential criticalities allows one to adopt a predictive perspective on the
problems, reducing the negative impacts of ineffectiveness of the courses, better-managing investments
in training and at the same time increase the usefulness and perceived effectiveness and the appreciation
of training initiatives. The ROI perspective is managed in progress in terms of containing the risks of
ineffectiveness.
   Adopting a data analysis perspective also allows to overcome the conformism of the support activities
sometimes affected by some typical distortions in the reading of the LMS tracking, as explained in
Table 1.
                            Table 1: Potential risks in reading tracking data
      Problem                     Description                     Data                    Actions
     Too much            Looking at the global tracking         Progress /        Different interventions
   generalization        data it is possible to generate       completions          by course progress
                         false results due to a vision of                                 classes
                               the global scenario
                           conditioned by some more
                             typical situations. Risk:
                               excessive behaviors
                                  categorization

   Naïve approach       Not adopting a data analysis        Times and comple-     Reminders on specific
                        model risks making subjective         tions/progress        cases and notes
                          considerations based on
                             experience and not
                         corroborated by data. Risk:
                             Data selection bias

 Over/underestimate      An intuitive approach to data         Times and          Wide and widespread
                             based on an exploratory          completions /          reminders on
                                method that if not              progress           highlighted case
                         corroborated by appropriate                                    studies
                          analysis risks to misleading
                           interpretations. Risk: Halo
                            effect of detected problems


2.2. KPIs from training to learning
Traditionally, KPIs are used to measure the effectiveness of training. The focus was on the added value
that training brought to the company. In the literature, the focus is on training that produces enabling
results for business performance. The training system is oriented towards producing specific results.
This highlights a training perspective approach, i.e. the enhancement of the training offer and its ability
to produce results and economic value.
   The theme of the evaluation was widely approached already at the end of the 50s with the classic but
still significant model of Kirkpatrick [[3]; [4]; [5]; and [6]]. It focused on the theme of the evaluation
of the effectiveness of training as well as on the binomial individual-environment, in favor of a logic
of sequential 4-step process in which the results of training are progressively brought back by the
individual to his context, and therefore require adequate evaluation tools. This dynamic view of the
assessment concerns the impacts in terms of:

    • Reactions: the degree to which participants find the training favorable, engaging, and relevant
      to their jobs (typically the evaluation of satisfaction of the training experience);
    • Learning: the degree to which participants acquire the intended knowledge, skills, attitude,
      confidence, and commitment based on their participation in the training (alias the evaluation of
      the added value of training, on the cognitive-affective-attitudinal level);
    • Behavior: the degree to which participants apply what they learned during training when they
      are back on the job (typically the manifestation of competent behaviors);
    • Results: the degree to which targeted outcomes occur as a result of the training and the support
      and accountability package.
   There was an intensification of research since the 80s in particular on the issue of the effectiveness
of training and its results. For example, the Baldwin and Ford model [7] highlights how learning
to be realized needs the interaction between the person, the appropriate structuring of training and
an organizational context that provides opportunities to implement learning. Subsequently, Noe’s
model [8] highlighted how the transfer of individual learning (linked to the acceptance of feedback,
individual expectations and an internal locus of control) in the work passes through the support of the
organization, also through social support (support for motivation) to implement adequate performance.
Thus, andragogic theories highlight how the transfer of learning is strongly linked to the perception of
the usefulness of learning to meet career expectations and satisfaction, within a work context oriented to
the management of feedback, to a climate of support between peers and superiors, and the opportunities
for application of learning [9]. In all these studies the theme of measurement and objectification of the
results has always been the reason for greater criticism of the evaluation models. The main criticisms
of Alliger and Janak’s (1989) concern the sequentiality and the different meanings and values of the
various levels of evaluation, with levels 3 and 4 being more important but often not measured for the
difficulties of detection [[10]; and[11]]. Further criticism is the inextricable connection between the
levels, with the consequent difficulty of isolating and measuring the impacts separately. Subsequently,
in parallel with the growth of diversified training experiences (also of an experiential type), a reasoning
on the issues of measuring actual results was activated in the literature.
   An alternative approach to Kirkpatrick [3]; [4]; [5]; and [6] to measure improvements made by the
training function is the method of Robert S. Kaplan and David P. Norton [12] typical of evaluating the
performance of organizational units using Balance Scorecards. This model considers four different
perspectives:
    • economic and financial (economic results that effectively describe the relationship between
      investment in training and the results generated);
    • customer-market (results in terms of satisfaction of internal customers alias participants);
    • internal processes (effectiveness of key processes for achieving training objectives);
    • learning-growth (generated learning outcomes).
   The BSC method (also in the specific case of training) is carried out by identifying KPIs (Key Per-
formance Indicators) in the four areas of investigation. This way, a 360° overall evaluation of the
analyzed activity is obtained. From the considerations that emerged over time and with the emergence
of the knowledge-based economy and a complex and flexible labor market, continuing education has
acquired strategic importance for both individuals and organizations, with the need to understand and
enhance the value of human capital. Although conceptually, Kirkpatrick’s model maintains its value
as a complex conceptual framework of intervention in the training evaluation field, it does require
corrective measures in the context of meanings and tools.
   My son Jim asked me, How much has your model changed since it was introduced in 1959? I replied that
the model remains essentially the same. The concepts, principles, and techniques are as applicable today as
when I introduced the model.
   [10]; and [11]
   If the distinction between learning and behavior is the main merit of Kirkpatrick’s model [3]; [4]; [5];
[6], some criticisms arose over time. I.e., that evaluative activities focus on the first level, partly on the
second. At the same time, they appear very complex to analyze, and with few and sporadic attempts,
the evaluations on the third and fourth (3) levels, moreover, with partly superficial and subjective
approaches [13]. From the investigation of Kennedy et al. [14], the problem does not seem to be the
lack of interest. Even the training professionals (practitioners) show a broader interest in the higher
levels with a lower perception of the usefulness of the lower levels.
   The most recent theories look at experimental analysis to identify the measurement of training
outcomes and the objective-results comparison through direct evidence between objectives-results [15];
[16]; [17]. Therefore, the models focus on summative evaluation (i.e., tangible, measurable results at
the end of the evaluation).
   Other interesting models are:
    • IPO Systematic approach (input, process, output) of Bushnell [18] contends that Kirkpatrick’s
      model focuses only on what happens after the training, but not the entire training process.
    • Brinkerhoff’s six-stage evaluation model [19]; [20] advocated circular evaluation by measur-
      ing all the instructional design elements. The Six-Stage Evaluation Model starts with a needs
      assessment and identifies training goals. Stage two evaluates the program design, and stage
      three evaluates program implementation, similar to Kirkpatrick’s Level 1 evaluation. Stage four
      evaluates the learning and is identical to Kirkpatrick’s Level 2. Stage five evaluates behavior and
      is similar to Kirkpatrick’s Level 3 evaluation. Stage six evaluates how much learning transferred
      to the results, as does Kirkpatrick’s Level 4.
    • Swanson’s performance improvement evaluation model (PLS). Swanson [21] argued that
      evaluation systems should be directed toward performance improvement regarding three evalua-
      tion domains performance (P), learning (L), and satisfaction (S).
    • Holton’s three-level HRD evaluation and research model. Holton’s HRD [22]; [23] identifies
      three outcomes of training – learning, individual performance, and organizational results, all
      similar to Kirkpatrick’s Levels 2, 3, and 4 because Holton stressed that reactions should not be
      considered a primary outcome of training. It identifies several variables known to affect the
      effectiveness of a training program.

   All cited research highlights the need to focus on the theme of training transfer as generally defined
in the literature. The transfer to the work context depends on the level of involvement and motivation
of the participant, the quality of training (understood as attention to learning in concrete reference to
the activities to be carried out), and the organizational context (understood as openness to practical
experimentation of learning). The latest contributions concerning the training evaluation model
highlight the importance of a multi-actor approach in which the different interests and potentially
conflicting needs of the stakeholders involved, internal and external to the organization, are represented
[16].
   A limitation commonly indicated by the literature is the need to equip oneself with adequate tools
to identify, study and highlight the results brought by the training. So the topic of data is relevant in
any training evaluation process. Our work, therefore, is part of the theoretical framework between
level 1 and level 2 of the Kirkpatrick model. We will show a model to reduce the potential effects of
ineffectiveness due to the training set, the material, or the object of the training, to facilitate a fluid
and practical learning experience to maximize training opportunities and then impact organizational
behaviors and final results.


3. A model to detect criticality
As extensively described in our previous work [2], we developed a model of analysis of the tracking data
starting from the experiences of digital learning operators to create a tool capable of synthesizing all
aspects of training. We developed a Macro Index of Performance (MIP) and seven sub-indexes: Results
(IR ), Study Pace (ISP ), Course Structure (ICS ), Computer Adequacy (ICA ), Community Participation (ICP ),
Educational Tutoring (IET ), and Process Tutoring (IPT ). Each sub-index consists of variables that describe
the various aspects of the sub-indexes.
   The MIP (and sub-indexes) makes comparable (online, classroom, blended) courses through a unified
analysis model. It can discriminate different experiences and indicate the expected outcomes based
on similar data, guiding the tutors in the differentiated intervention methods to support and facilitate
learning.
   Using the feedback received after the first experiments, we oriented the subsequent research to study
critical situations in the tracking. Using the values of the MIP (and sub-indexes) and the raw values
of the tracking, we could highlight areas of riskiness and provide tutors with potential preventive
intervention strategies. The MIP and sub-indexes describe the students’ behaviors and indicate any
critical issues that characterize the courses. Through the automated analysis of the courses, tutors can
Figure 1: Tipical participation trend in an online course


focus on those who are most at risk for different types of critical issues. Following the general definition
of criticalities in section 2.1, we defined three classes of criticalities:

    • Participation (CPA ), the access and use of course materials;
    • Performance (CPE ), the effectiveness of the training course according to the training plan;
    • Structure(CS ), criticality in using educational resources with excessive use of time and expendi-
      ture of organizational costs.

3.1. Criticality Details
It is essential to define what critical course means concerning a group of participants and for the
individual user. Generally, the criticality of a course is considered based on the use of the time allocated
and the need to conclude all the teaching activities to reach the minimum requirements. However, a
course is not always critical, or at least not for everyone. Generally, there is an increasing participation
distribution that peaks on 3/4 of the time available (Figure 1).


                      Table 2: Criteria for participation criticality algorithms (CPA )
     CPA TYPE                      CONDITION                    PROBLEM               INTERVENTION
          A                   Elapsed available time          Reduced number          Reminder to Start
                          between 15% and 25%, not less      of users who have
                          than 75% of participants have            started
                            a use time of less than 25%

          B                   Elapsed available time         Reduced number           Reminder to Study
                          between 25% and 50%, not less       of users with
                          than 50% of participants have         significant
                            a ouse time less than 50%            progress

          C                   Elapsed available time          Risk of drop out       Reminder to Recover
                          between 50% and 75%, not less
                          than 25% of participants have
                            a use time of less than 25%

          D                 Elapsed available time over         Users with          Reminder to Complete
                           75%, not less than 25% of the        completing
                           participants have a use time          difficulty
                                   less than 75%

          X               Available time ended, at least     Incomplete course      Extension or Recovery
                          10% of users did not complete
                                   the course
                       Table 3: Criteria for structural criticality algorithms (CS )
     CS TYPE                    CONDITION                      PROBLEM                 INTERVENTION
          Z             At least 50% of the participants    Inconsistent use of        Review course rules
                         have fruition times greater         time, educational           and/or design
                        than twice the time provided            resources
                             in the course design               criticality


                     Table 4: Criteria for performance criticality algorithms (CPE )
     CPE TYPE                   CONDITION                      PROBLEM                 INTERVENTION
         A                 Elapsed available time           Inadequate study           Encourage effective
                         between 20% and 40%, less             behaviour                     study
                        than 20% of participants with
                        MIP forecast at the end of the
                          course > threshold value

          B                Elapsed available time           Study behavior at          Make organizational
                         between 40% and 75%, less                risk of              proposals and give
                        than 50% of participants with        ineffectiveness                 advice
                        MIP forecast at the end of the
                          course > threshold value

          C                Elapsed available time           Highly risky study         Give organizational
                         between 75% and 80%, less              behavior                    proposals,
                        than 50% of participants with                                   re-registrations /
                        MIP forecast at the end of the                                     extensions
                          course > threshold value
          X             Available time ended, at least      Incomplete course      Extension or Recovery
                              10% of users with
                        end-of-course MIP below the
                                  threshold

   We should use different rules when the course (time) progresses. Using this framework, we built a
criticality detection system inside LearnalyzeR (our learning analytics). Concerning the various critical
issues based on the problems highlighted, we carried out particular interventions that the tutors and
the system can implement to prevent the risk of ineffectiveness (Table 2, Table 3, Table 4).

3.2. Critical, predictive and preventive tutoring models
The course critical issues detection algorithms compare the raw data of use and the MIP of all the course
users with what the course designers and commitment expect. It is necessary to distinguish two types
of expected behaviors: the one during the delivery and the one at the end of the course. The criticality
algorithms CPA (Table 2) and CS (Table 3) intercept the expected behaviors during the course, even at
the beginning. In contrast, the algorithms for CPE (Table 4) are based on the prediction of performance
at the end of the course (final MIP) and depend on the value of the MIP at various steps of the course
(available time and completion). For this last criticality, we used a Bayesian beta regression model [2],
using the minimum of the MIP estimation interval as a precaution. Then we choosed the threshold
value in Table 4 by analyzing the MIP values in the previously closed courses. In the following examples,
the chosen threshold value for the minimum MIP forecasted is 701 . The model, i.e., its parameters2 ,
are adaptable, through the retrospective analysis of the closed courses, to the customer’s requests, the
course structure, and the performance requests.
  Our tracking critical issues models allow for timely tutoring actions, which, thanks to the predictive
model of the final performance, we can define preventive tutoring.
  Tracking critical issues during the course also allows the ex-post evaluation of tutoring actions’
design, delivery, and validity.

3.3. Data based tutoring interventions
One defined criticality criteria, we must combine specific intervention criteria based on observed data.
Table 5, Table 6, and Table 7 show specific intervention actions for critical issues, as defined in Table 2,
Table 3, and Table 4, respectively.


          Table 5: Interventions proposed for participation’s criticalities (CPA ) as defined in Table 2.
CRITICALITY                  INTERVENTIONS
CPA type A                   a1) Verify that everyone received login information;
                             a2) verify with the client and/or the managers that there are no work
                             contingencies (deadlines, unforeseen events, workloads, etc.) that affect the
                             people’s availability;
                             a3) check the sub-index of IT adequacy to verify if there are technological
                             barriers that influence the correct performance of the study activities;
                             a4) check the sub-indices of results and pace of study and verify on the platform
                             the actual times recorded;
                             a5) assess people’s level of engagement and motivation for the course.

CPA type B                   b1) Send reminders to those who have not accessed the platform, remotivating
                             the importance of participating in the course;
                             b2) for those who have started, motivate participation to recover any study gaps
                             compared to the expected;
                             b3) remind everyone of times and deadlines;
                             b4) check the sub-index results (Pace of study and IT adequacy), and verify the
                             raw participation times recorded, signaling users to respect the study times
                             provided to obtain certificates and certifications.

CPA type C                   c1) Recover participants who have been left behind, for whom it is necessary to
                             ask for an effort of participation;
                             c2) identify and quantify the cases of proper abandonment of the course;
                             c3) check the sub-index results (Pace of study and IT adequacy), and verify the
                             raw participation times recorded.
                             For people who have lagged in the fruition, i.e. who have not connected or who
                             have short times compared to the expected, it is essential to provide precise
                             indications to complete the course, indicating average daily study time,
                             minimum necessary activities, and operational advice. For people with study
                             times that do not comply with expected, proceed to report that it is essential to
                             respect the study times indicated to obtain certificates and certifications.


1
    Recall, as described in [2], the MIP assumes values between 0 and 100
2
    Percentage of times and users for participation, structure and performance; threshold value of the index for predicting
    performance.
CRITICALITY             INTERVENTIONS
CPA type D              d1) Communicate to participants that they are at risk of non-completion and
                        non-compliance with the requirements;
                        d2) verify the problems encountered;
                        d3) highlight a possible study plan to recover and complete the course.
                        For people who have lagged in the fruition, i.e. who have not connected or who
                        have short times compared to the expected, it is essential to provide precise
                        indications to complete the course, indicating average daily study time,
                        minimum necessary activities, and operational advice.


         Table 6: Interventions proposed for structure’s criticalities (CS ) as defined in Table 3.
CRITICALITY             INTERVENTIONS
CS type Z               z1) Verify organizational times and conditions for the schedulation’s adequacy
                        with the structure of the course and the expected complexity;
                        z2) verify the incidence of technical factors.


          Table 7: Interventions proposed for structure’s criticalities (CPE ) defined in Table 4.
CRITICALITY             INTERVENTIONS
CPE type A              a1) Acting on people by remembering tips for effective and productive study.
                        Refer to the tutor for personalized advice.

CPE type B              b1) Identify lagging users, and check their actual completion status and the
                        sub-indexes.

CPE type C              c1) Identify lagging users, and check their actual completion status and the
                        sub-indexes.

CPA type X              x1) The course is over. Among the course completed participants, more than
                        10% have an index of less than 70.


4. Visualizations and evaluation of critical issues in LearnalyzeR
In this section, we report examples of some courses in which, through the features of LearnalyzeR,
it is possible to view and investigate the reasons that lead to the detection of critical issues with the
criteria previously described. Using a medical paraphrase, we want to remember how critical algorithms
identify symptoms. At the same time, through LernalyzeR’s visualizations of MIP, sub-indices, and
variables that compose them, we can understand the causes (diagnosis) and prepare targeted tutoring
actions (therapy). As our customers asked, in the following examples, we will show the logic of analysis
passing from one course to another without fully showing the critical issues of one or more courses.
    The first step in LearnalyzeR is the visualization of the median value of the Macro Index of Performance
(MIP). While it can assume different states, we synthesized it using a colored dot over the median value
using the following chromatic categorization (Figure 2):

    • Gray: not calculable;
    • Red: inadequate performance with many critical elements (< 50);
Figure 2: Median and IQR of the MIP for a course.


Figure 3: Safety at work courses, population 1


    • Orange: quite adequate performance with few critical elements (≥ 50 and < 70);
    • Yellow: good performance with areas for improvement (≥ 70 and < 80);
    • Green: excellent performance (≥ 80).

  In the rest of the section, we will report some examples from a public administration. The intent is to
show how the LearnalizeR framework (MIP, sub-indexes, and visualizations) matches the need to find
the causes of course criticalities.

4.1. Examples on Public Administration Customer
For an Italian Public Administration, we gave two series of courses over time, Digital skills and Safety
at work, on two different participant population types. Here are some compelling examples.
   Example 1 comparison between different users population on the same courses
   Here we will show the results for two populations on two courses (General Safety Training and
Security Training Update).
   We deduced a difference in performances between population 1 (Median MIP around 74, Figure 3)
and population 2 (Median MIP around 65, Figure 4)
   We observed a more significant variability of study behavior in population 2; there is a long queue on
Figure 4: Safety at work courses, population 2


the left (towards the low values of the MIP up to a value of 27) for both Safety courses (General Training
and Updating). Instead, population 1 has a homogeneous behavior in the MIP range between 69 and 78.
   Example 2 – Detailed Analysis of a course criticality We analyzed the MIP and the sub-index in
detail, observing the variables that make up the individual indicators. As said, the course structure has
an average complexity because, despite having a high methodological complexity (value for variable
methodological complexity is 100, the maximum), there is no test, and registration is not mandatory.
The running course presents participation criticality of type d, and the sub-index Results have a median
50. In the case under study, this indicator directly gives the value at course completion while there is
no test and no badge.
   Both participation criticality and similar behavior between the running and concluded courses lead
the tutor to undertake engagement actions. Users have good IT adequacy, but there is a problem with
open sessions (variable open sessions is 60). The combination of the latter information with the structure
criticality highlights to the tutor the non-effectiveness of using the time of use/study. The pace of study,
although adequate, presents problems. Users have an adequate time of use, but effective days of study
and the number of accesses are deficient.
   According to forecasts, 95% of users are hard workers (final MIP predicted between 70.6 and 85.8,
95% C.I.). Still, comparing this running course with the concluded, we expect this forecast to change
over time. We expect that there will be a dropout at 50% of course completion. Users in the first half of
the course performed well despite problems in the distribution of study time: open sessions and low
values of the actual study days variable. Macro-index forecasts are above the threshold, but all users do
not go beyond 50% of the completion of both running and concluding courses.

4.2. Examples from analysis of didactical units
We need to observe the participants’ behavior using individual didactical units (ud) to analyze the
courses that present critical structure and the problem encountered in example 3 of Section 4.1. In
particular, we focused on the number of accesses in the individual ud and the time of use per ud. This
analysis, combined with the knowledge of the course structure, allowed us to understand any critical
issues in the individual ud:

    • high or no number of accesses
    • high or low time of use
    • high population behavior variability in the number of accesses and the use time.


5. Conclusions
The systematic use of a criticality detecting system ensures the application of a regular and rigorous
procedure for analyzing behaviors and results. It effectively monitors many courses and participants,
even in different training courses for methods and durations. These measures also make it possible
to effectively modulate data-driven tutoring systems, effectively analyzing the impacts in terms of
effectiveness on the indicator. Adopting a statistical approach also means exploiting the information
capital present in historical data to enhance the predictive system. Through the predictability and
systematic analysis of the macro-index, sub-indices, and criticalities, tutors can act selectively in
the most at-risk situations. They can modulate their intervention between the different categories
of problems, individualizing communication and generalizing it or delaying it if valuable and less
critical. Therefore, data-based tutoring becomes an intervention system consistent with the training
system, independent of the tutors’ sensitivities, guaranteeing a systematic and fair approach for all
participants. Likewise, adopting a data-driven intervention system allows tutors to evaluate their actions
in terms of effectiveness, striving to identify the best solution for individual critical issues. The tutoring
action, therefore, starts from the same analysis scheme, guaranteeing tutors intervention methods that
specialize in how to communicate with the participant. Above all, on these interventions, it is possible
to evaluate the effectiveness with an analysis of the indicators and post-intervention criticalities to
determine the actions/tools/messages with more excellent added value and more functional for the
different categories of problems. The positive impact of the tutoring action is generally little measured
and widely delegated to the tutors’ expertise, who need to reread their efforts in terms of effects to
evolve the intervention schemes to be an accurate guide of learning.
   Our approach allows us to glimpse future steps to take. Integration of new features in the predictive
and classification system will be:

    • Include the tutoring actions performed following the analysis in the macro-index track, enhancing
      both the actual tutoring actions and those of a systematic type to evaluate their impacts on
      subsequent results.

    • Study the evaluation thresholds of the sub-indices to decode the critical areas better.
    • Integrate into the evaluation of the macro-index with extra measures for the sub-category Results
      with:
         – Focus on the analysis of learning assessment;

         – Observed results/work performance;


    • Improve the satisfaction measurement in the category didactic tutoring also through NLP analysis.


References
 [1] T. G. Reio, T. S. Rocco, D. H. Smith, E. Chang, A critique of kirkpatrick’s evaluation model, New
     Horizons in Adult Education and Human Resource Development 29 (2017) 35–53.
 [2] D. Pellegrini, M. Santoro, S. Zuzzi, Learning analytics and governance of the digital learning
     process., in: teleXbe, 2021.
 [3] D. L. Kirkpatrick, Techniques for evaluating training programs, Journal of the American Society
     of Training Directors 13 (1959) 3–9.
 [4] D. L. Kirkpatrick, Techniques for evaluating training programs: part 2 – learning, Journal of the
     American Society of Training Directors 13 (1959) 21–26.
 [5] D. L. Kirkpatrick, Techniques for evaluating training programs: part 3 – behavior, Journal of the
     American Society of Training Directors 14 (1960) 13–18.
 [6] D. L. Kirkpatrick, Techniques for evaluating training programs: part 4 – results, Journal of the
     American Society of Training Directors 14 (1960) 28–32.
 [7] T. T. Baldwin, J. K. Ford, Transfer of training: a review and directions for future research, Personnel
     Psychology 41 (1988) 63–105. doi:10.1111/j.1744-6570.1988.tb00632.x.
 [8] R. A. Noe, Trainees’ attributes and attitudes: Neglected influences on training effectiveness,
     Academy of management review 11 (1986) 736–749.
 [9] L. A. Burke, H. M. Hutchins, Training transfer: An integrative literature review, Human Resource
     Development Review 6 (2007) 263–296. doi:doi.org/10.1177/1534484307303035.
[10] D. L. Kirkpatrick, Great ideas revisited: Revisiting kirkpatrick’s four-level model, Training and
     Development 50 (1996) 54–58.
[11] D. L. Kirkpatrick, J. Kirkpatrick, Transferring learning to behavior: Using the four levels to improve
     performance, Berrett-Koehler Publishers, 2005.
[12] R. S. Kaplan, D. P. Norton, The balanced scorecard, 1996.
[13] R. Catalano, D. Kirkpatrick, Evaluating training programs: The state of the art, Training and
     Development Journal 22 (1968) 2–9.
[14] P. E. Kennedy, S. Y. Chyung, D. J. Winiecki, R. O. Brinkerhoff, Training professionals’ usage and
     understanding of kirkpatrick’s level 3 and level 4 evaluations, International Journal of Training
     and Development 18 (2014) 1–21. doi:https://doi.org/10.1111/ijtd.12023.
[15] D. Eseryel, Approaches to evaluation of training: Theory and practice, Journal of Educational
     Technology and Society 5 (2002) 93–98. URL: http://www.jstor.org/stable/jeductechsoci.5.2.93.
[16] E. Bartezzaghi, M. Guerci, M. Vinante, La valutazione stakeholder-based della formazione continua,
     Franco Angeli, Milano, 2010.
[17] M. Coldwell, T. Simkins, Level models of continuing professional development evaluation: a
     grounded review and critique, Professional Development in Education 37 (2011) 143–157.
[18] D. S. Bushnell, Input, process, output: A model for evaluating training, Training and Development
     Journal 44 (1990) 41–43.
[19] R. O. Brinkerhoff, Achieving results from training, Jossey-Bass, San Francisco, CA, 1987.
[20] R. O. Brinkerhoff, S. J. Gill, The learning alliance, Jossey-Bass, San Francisco, CA, 1994.
[21] R. Swanson, Analysis for improving performance: Tools for diagnosing organizations and docu-
     menting workplace expertise, Berrett-Koehler Publishers, 2007.
[22] F. E. Holton, S. S. Naquin, New metrics for employee development, Performance Improvement
     Quarterly 17 (2004) 56–80. doi:https://doi.org/10.1111/j.1937-8327.2004.tb00302.
     x.
[23] E. F. Holton, R. Bates, W. E. A. Ruona, Development of a generalized learning transfer system
     inventory, Human Resource Development Quarterly 11 (2000) 333–360.