Users behavioural inference with Markovian
         decision process and active learning

       Firas Jarboui1,2 , Vincent Rocchisani2 , and Wilfried Kirchenmann2
                         1
                        ENSTA, France and ENIT, Tunisia
                     2
                       ANEO, Boulogne Billancourt, France
                {fjarboui,vrocchisani,wkirschenmann}@aneo.fr


1     Introduction
Studies on Massive Open Online Courses (MOOCs) users discuss the existence
of typical profiles and their impact on the learning process of students. One of
the concerns when creating a new MOOC is knowing how the users behave when
going through the contents. We can identify either quantitative methods that
allow you to infer hardly interpretable groups of similar behaviour[1] or hardly
context-transposable qualitative methods[2]. Our ambition is to find an efficient
way to identify the behavioural pattern of interest to a given human expert.
Within the #MOOCLive project3 , we developed a mix-method to match the
quantitative interpretation to the context needs.


2     Methodology
We tackled the following three problems in order to achieve our goal.
 – The definition of a quantitative metric to compare behaviours
 – The inference of qualitative sets behaviours from existing ones and test their
   reliability for describing the reality.
 – The convergence between the quantitative-based clustering and the qualita-
   tive sets of behaviour to classify the users accordingly
In order to achieve our goal we define three main tasks. We start by quantifying
the interest of users for the platform’s activity. This will allow us to define a
distance between their behaviours. Then, we iteratively make hypothetical class
definitions and test how well they fit the existing population. This is repeated
until both the classifier and the classes suggested by the process are deemed
satisfactory. This process’ breakdown is represented in Fig. 1.
1. Quantitative modelling of the user: We define the structure of the
   MOOC as a Markovian decision process framework. Let H be the history of
   actions the user performed on the platform. We define the gain function GbH
   of a user as the expected value of a categorical soft-max probability distri-
   bution over SGF , a Sample from the space of all possible Gain Functions.
3
    #MOOCLive Virchow-Villermé ANR-15-IDFN-0003-04


                                       59
Users Behavioural Inference with Markovian DP and Active Learning

     The value associated to each element of this sample is the sum of rewards
     that the user’s action would yield under the given gain function. This is
     thoroughly discussed in[3]. Each user is then characterised by the expected
     utility of each state with a discount factor γ.
             (             P                )
               U (G|H) = a∈H G(a)                       X
                                                  b
               P(G|H) = P e eU (G0 |H) ⇒ GH =                 G × P(G|H)
                                U (G|H)


                           G0 ∈SGF                    G∈SGF

2. Qualitative class definition: This step is purely human. The experts are
   asked to interfere and define the classes that will be used to build the quan-
   titative classification. In this stage, the expert intervention is purely based
   on his a priori. If the expert’s a priori is invalidated during the process, he
   will have to restart from here with an updated point of view.
3. Fitting the classification: To have well classified users a Gaussian kernel
   label propagation is used. This provides a probability distribution of mem-
   bership to each pattern for each behaviour. An active learning process is
   used to iterate the propagation of the labels under the supervision of the
   human expert. After each fold, we sample the users randomly and test if
   the output probability distribution makes sense. The human expert either
   agrees with the results, changes them or tags them as unsure.
   If the rate of changed results is high, we continue the active learning loop.
   As a result, the rate of bad labels will decay.
   Once the classifier stabilizes, we consider the rate of behaviours that the ex-
   pert tagged as unsure. If this exceeds a threshold, we roll back to the second
   step to challenge the a priori class definitions.
   If the unsure tags rate is low enough, we can safely assume that the two
   models converged with respect to the expert.
   We applied this methodology on a MOOC4 with a sociologist. We started
with an a priori of three user profiles. Up to this date, after three iteration of
the methodology, we were able to identify seven profiles that fulfil the context
needs and to classify the users accordingly.


3     Conclusion

Our method assists a human expert to find the optimal information about the
studied population. Although this work is still in progress and only tested on
MOOC log data, it should be applicable on other log data streams of information.
Future tests will involve marketing related data. We are currently investigating
the efficiency of this method as well as the best techniques to use for each step.
This is part of a preliminary work for a thesis.


4
    https://www.fun-mooc.fr/courses/VirchowVillerme/06005/session01/about


                                       60
Users Behavioural Inference with Markovian DP and Active Learning


                     Fig. 1. Users behavioural inference process


                                            Quantitative modelling of the users
      Course model
       as an MDP
                                     MDP utility functions as users features
           Users
         log data


    Qualitative class                                     Fitting the classification
    definition                  Sample
                                of users                    error > threshold
                                                             Active learning


                                                        Label
       User classes              Users               propagation         Evaluate
      (expert a priori)         sample                (gaussian         propagation
                               labeling                 kernel)


                                        evaluate
                                                            error < threshold
                                      classification
    Qualitative analysis
      to redefine the
          classes
                                                Satisfactory results


References
1. Chase Geigle and Cheng Xiang Zhai: Modelling MOOC Student Behaviour With
  Two-Layer Hidden Markov Models. Learning at Scale (2017)
2. Paula de Barba Carleton Corin, Linda Corrin and Gregor Kennedy: Visualizing
  patterns of student engagement and performance in moocs. (2014)
3. Constantin A. Rothkopf and Christos Dimitrakakis: Preference Elicitation and In-
  verse Reinforcement Learning. cornell university library (2011)


                                           61