=Paper= {{Paper |id=Vol-1993/2 |storemode=property |title=Predicting Dropouts on the Successive Offering of a MOOC |pdfUrl=https://ceur-ws.org/Vol-1993/2.pdf |volume=Vol-1993 |authors=Massimo Vitiello,Simon Walk,Denis Helic,Vanessa Chang,Christian Guetl }} ==Predicting Dropouts on the Successive Offering of a MOOC== https://ceur-ws.org/Vol-1993/2.pdf
                            Predicting dropouts on the successive
                                     offering of a MOOC

                Massimo Vitiello1 , Simon Walk1 , Denis Helic1 , Vanessa Chang2 , and Christian
                                                   Guetl12
                      1
                           Graz University of Technology massimo.vitiello@student.tugraz.at,
                              simon.walk@tugraz.at, dhelic@tugraz.at, cguetl@iicm.edu
                            2
                              Curtin University of Technology vanessa.chang@curtin.edu.au


                          Abstract. In recent years, we have experienced the rise of e-learning and
                          the growth of available Massive Online Open Courses (MOOCs). Con-
                          sequently, an increasing number of universities has dedicated resources
                          to the development and publishing of MOOCs on portals. A common
                          practice for operators of such MOOCs is to re-offer the same course over
                          the years with similar modalities and minor improvements. Such re-runs
                          are still affected, as most of the MOOCs, by a low percentage of enrolled
                          users who manage to successfully complete the courses. Hence, analyz-
                          ing similar MOOCs can provide valuable insights to better understand
                          the reasons of users for dropping out and potentially can help MOOCs’
                          administrators to better shape the structure of the courses to keep users
                          engaged. To that end, we analyze two successive offerings of the same
                          MOOC, created by Curtin University and published on the edX platform.
                          We extract features for our prediction experiment to detect dropouts,
                          considering two different metrics: (i) the percentage of users active time
                          and (ii) the initial week after users first interaction with the MOOC.
                          We train a Boosted Decision Tree classifier with the extracted features
                          from the original run of the MOOC and predict dropouts on its re-run.
                          Furthermore, we identify a set of features that likely indicates whether
                          users will drop out in the future or not. Our results indicate that users
                          interacting with particular tools at the very beginning of a MOOC are
                          more likely to successfully complete the course.



               1    Dropouts in MOOCs
               Online learning enormously changed over the past years. The Internet became
               the channel on which a variety of new types of learning methodologies material-
               ized. Massive Online Open Courses (MOOCs) emerged as the natural solution
               to offer distance education. The advantages of MOOCs are various. They are
               Massive, in the sense that a potentially unlimited audience can enroll; Online,
               as all that is needed, is an internet connection, without any geographical limi-
               tation; Open, since the majority of them have no enrollment costs, nor do they
               require particular prerequisites; a Course, as their structures resemble traditional
               lectures, with assignments and exams. [9, 10] Despite these advantages, the ex-
               pectations MOOCs carried with them have not yet been completely reached.




Proceedings of the International Conference MOOC-MAKER 2017.                                           11
Antigua Guatemala, Guatemala, November 16-17, 2017.
               Nearly all MOOCs suffer from high dropout rates. While the number of users
               who enroll is high, the percentage of those that successfully complete the course
               is very low (generally lower than 10%) [7]. High dropout rates are a problem not
               only for single runs of MOOCs but are also present in successive offerings of the
               same course. We refer to the same offering of a MOOC, happening at a later
               point in time, as re-run. These are characterized by having structures, topics,
               and schedules similar to those of the original course. Therefore, lower efforts are
               required when organizing successive re-runs. Furthermore, content creators can
               make use of previous users’ feedback to modify and reshape the re-run, in order
               to better meet users’ goals and expectations, while generally enhancing the over-
               all learning experience. The investigation and comparison of re-runs of the same
               MOOC can help us to increase our understanding of the learning style of the
               users and how to support them, with the clear goal of mitigating dropout rates.
               In this paper, we experiment with early dropout detection on MOOC re-runs.
               Particularly, we analyze 2 MOOCs offered on edX by Curtin University (Perth,
               Western Australia)3 , the second of which is the re-run of the first one. First,
               we investigate a varying percentage of users total active time. Second, we focus
               on the first week of interactions of each user. This allows us to verify if users’
               behavior during the initial stage of the course is a strong indicator of their out-
               come and if this is also true for re-runs. Furthermore, we identify which features
               are the most valuable to correctly predict if users will drop out or not.

               2      Related Work
               Gütl et al. [5] analyzed survey answers of users who dropped out, across MOOCs
               offered by Universidad Galileo. They proposed an Attrition Model for Open
               Learning Environment Setting (AMOES), which builds up and extends the Fun-
               nel of Participation Model [2], and split the attrition into healthy and unhealthy.
               Within healthy attrition, they identify 3 subgroups according to users’ goals, ex-
               pectations, and reasons to drop out. Particularly, users are classified as either
               Exploring User, Content Learner or Restricted Learner.
                   Coffrin and colleagues [3] studied two MOOCs developed at the University
               of Melbourne. Principles of Macroeconomics was an introductory course with
               the material available in a linear structure over its 8 weeks duration. Discrete
               Optimization was a self-paced graduate level course, which lasted 9 weeks. The
               authors used a linear regression model and analyzed the relation between users’
               final grade and their interactions during the first two weeks of the MOOCs. They
               also used State Transition Diagrams to discover similarities across the users of
               the MOOCs, by visualization of assignment and weekly video interactions.
                   Kloft and al. [8] experimented with weekly dropouts classification using Sup-
               port Vector Machines (SVM) on a 12 weeks MOOC with an 81.4% dropout rate
               offered on Coursera. The authors used cumulative features (number of interac-
               tion, number of view of each page of the course) and technical ones (browser, OS,
               number of screen pixel). They compared a trivial baseline (always predicts one
               or the other class) to the performance of SVM, which showed higher accuracy.
                3
                    https://www.edx.org/school/curtinx




Proceedings of the International Conference MOOC-MAKER 2017.                                         12
Antigua Guatemala, Guatemala, November 16-17, 2017.
                   Amnueypornsakul and colleagues [1] experimented with dropout classifica-
               tion using SVM on a MOOC with roughly 30,000 enrolled students. The authors
               used quiz related and activity related features, verifying by ablation analysis that
               the two sets are both important for the prediction task. Furthermore, they no-
               ticed that the class imbalance and the presence of users with few interactions
               (Inactive) complicate the classification task.
                   In our previous work [13] we experimented with dropout prediction across 5
               MOOCs offered by Universidad Galileo on the university portal Telescopio. They
               derived features from users’ sessions and also considered the number of time a
               tool was used as a measure. They analyzed the MOOCs using SVM and K-Means
               as classifiers and tested different combinations of features. In their results, K-
               Means always fell behind SVM and the prediction was improved by different
               combinations of features. In 2017 We refined our approach in [12] and developed
               a general classifier for dropout detection across different MOOC platforms.
                   Teusner et al. [11] analyzed 3 iterations of the MOOC ”In-Memory Data
               Management (IMDM)” offered on the onpenHPli platform, hosted and devel-
               oped by the Hasso Plattner Institute of Potsdam. The MOOC was held in En-
               glish and was intended for learners with a business background and academics.
               While the content of the first two interactions barely differed, the third offering
               was improved according to user feedback. Some units were reshaped to ease un-
               derstanding, and about 60% of all videos were modified. The authors concluded
               that future interactions with stable material attracted a wider audience with
               low effort, using the forum for content-based communications can help to pro-
               mote users’ engagement and content-related feedback should be introduced and
               addressed as fast as possible.

               3      Materials and Methods
               3.1     Datasets
               Our dataset consists of the original offering of a MOOC, referred to as MOOCC1,
               and the first of its re-runs, coded as Re-Run1. Both MOOCs were created by
               Curtin University and were available on edX4 . The original offering MOOCC1
                4
                    https://www.edx.org/school/curtinx


               Table 1. Summary of the MOOC and the re-run. Active users are the part of Enroll-
               ments that conducted at least more than one interaction. The second class of users is
               the Inactive one. In the column Dropouts we indicate the number of users that drop
               out in relation to the Active users and in relation to the Enrollments (in brackets).
               Analogously, the dropout rates are relative to Active users and to the Enrollments.

                MOOC        Enrollments Active Inactive Completers      Dropouts      Dropout Rate
                MOOCC1         21948      13396    8552        1500   11896 (20448)    89% (93%)
                Re-Run1        10368       5932    4436        208     5724 (10160)    96% (98%)




Proceedings of the International Conference MOOC-MAKER 2017.                                           13
Antigua Guatemala, Guatemala, November 16-17, 2017.
               was available online during the second semester of 2015, while the re-run Re-
               Run1 was available online between April and May 2016. Both offers had no
               entry prerequisites. However, users needed to have access to YouTube in order to
               watch the independently created video content. Regular activities, such as polls,
               questions, and discussion board tasks were integrated into the course content.
               The course syllabus consisted of a total of four modules, each estimated to require
               a time commitment of two hours per week. An extra introductory module and
               a course wrap up module completed the course calendar. To complete each of
               the four main modules, participants needed to complete an activity and a quiz,
               with the quizzes being an extension of activities. Therefore, engaging in the
               activities helped participants to answer the questions in the quizzes. Each quiz
               accounted for 25% of the final grade, with a Certificate of Achievement issued
               to participants with an overall score of equal or greater than 70%. The team
               of instructors consisted of an associate professor and a teaching assistant. The
               two offers were for the larger part similar to each other regarding contents and
               activities, with Re-Run1 undergoing only some minor changes. Table 1 reports a
               summary of the enrollments and completions of the original MOOCs and of its re-
               run. As the column Enrollments reports, MOOCC1 has a total of 21,948 enrolled
               users and its re-run, Re-Run1, counts 10,368 enrolled users. Within the enrolled
               users, we distinguish users that enroll and leave the MOOCs without engaging
               any further (i.e., Inactive users), from those who have more than the simple
               enrollment interaction (i.e., Active users). Completers are users that successfully
               completed the MOOCs, while Dropouts are those who failed to do so. Overall,
               the dropout rates are never lower than 89%.
                   Both offers are structured in a self-paced manner and are organized in two
               phases. During the first phase, users can only access the course main page and
               enroll. At this stage, the course’s material is not available, and only a limited
               number of interactions is possible. This initial phase lasts for roughly 2 months
               for both MOOCs. At the beginning of the second phase, the course material
               is uploaded all at once and users can engage at their own pace. Enrollment is
               possible during the second phase as well. After the official end of the MOOCs,
               users still can register and interact with the MOOC, but, in this case, they
               can not obtain a certificate as the course is already officially over. Due to these
               settings, we consider only enrollments happening before the official end of the
               MOOC. Furthermore, we also discard interactions happening before the course’s
               official start. Such interactions do not represent users learning style because the
               course material is not available yet. Both MOOCs also included a course forum
               where users can post and discuss.

               3.2 Experimental Setup
               We extract a set of features to describe each user of the MOOCs from Curtin
               University. First, we calculate a set of time-based features that build upon the
               concept of sessions. A session is a set of chronologically ordered interactions, in
               which each interaction happens within a certain timespan from the previous and
               the next one. Particularly, we use a threshold of 30 minutes and calculate sessions
               for all users. Following this concept of sessions, we define the following features:




Proceedings of the International Conference MOOC-MAKER 2017.                                          14
Antigua Guatemala, Guatemala, November 16-17, 2017.
               Sessions as the total number of users’ sessions; Requests as the total number
               of interactions per user; Active Time is the total time users interacted with the
               MOOC (as sum of all sessions’ duration); Days indicates the number of days
               during which users interacted at least once with the MOOC. Furthermore, we
               compute 4 averaged features; Timespan Clicks is the average timespan between
               two consecutive clicks in the same session (averaged over all sessions); Session
               Length as Active Time divided by Sessions; Session Requests as Requests over
               Sessions; Day Requests is Requests divided by Days. Moreover, we exploit the
               detailed edX logs, to identify the type of event triggered and the particular tool
               each interaction refered to. Curtin University’s MOOCs includes 6 specific tools5 ;
               Course Navigation, Video, Problem, Poll & Survey, Bookmark and Discussion
               Forum. For each of these tools, a set of events indicates the particular action
               that took place. The list of events for each tool, together with the session related
               features we calculated, are reported in Table 2. The Video tool, is the only one
               that allows distinction between Browser and Mobile (through the edX mobile
               application) triggered interactions. To categorize these two sources, we create an
               additional tool, filtering out Mobile interactions from Video, and we name this
               new tool Video Mobile. We create a feature for each event, counting the number
               of times each of these was triggered by users’ interactions. Secondly, we define
               two different approaches and calculate the features according to these. First, we
               consider a varying percentage of users total active time. Particularly, we consider
                5
                    The complete list of edX’s events is available at http://edx.readthedocs.io


               Table 2. Summary of Tools and their events. In the first column we list the name
               of the tools and in the second column we report the list of events referring to that
               particular tool.

                        Tools       Events
                 Session Related    Sessions, Requests, Active Time, Days, Timespan Clicks, Session Length,
                                    Session Requests, Day Requests
                Main Page Links     About, Faqs, Home, Instructor, Progress, StudyAtCurtin
                Course Navigation   TabSelected, PreviousTabSelected, NextTabSelected, LinkClicked, OutlineSelected
                                    CaptionHidden, CaptionShown, LanguageMenuHidden, LanguageMenuShown,
                       Video        Loaded, Paused, Played, PositionChanged, SpeedChanged, Stopped,
                                    TranscriptHidden, TranscriptShown
                                    CaptionHiddenM, CaptionShownM, LanguageMenuHiddenM,
                    Video Mobile    LanguageMenuShownM, LoadedM, PausedM, PlayedM, PositionChangedM,
                                    SpeedChangedM, StoppedM, TranscriptHiddenM, TranscriptShownM
                                    Check, CheckFail, FeedbackHintDisplayed, Graded, HintDisplayed, Rescore,
                      Problem
                                    RescoreFail, Reset, ResetFail, Save, SaveFail, SaveSuccess, Show, ShowAnswer
                    Poll & Survey   PollSubmitted, PollViewResults, SurveySubmitted, SurveyViewResults
                     Bookmark       Accessed, Added, Listed, Removed
                                    CommentCreated, ResponseCreated, ResponseVoted, Searched, ThreadCreated,
                Discussion Forum
                                    ThreadVoted




Proceedings of the International Conference MOOC-MAKER 2017.                                                          15
Antigua Guatemala, Guatemala, November 16-17, 2017.
               all interactions within the first 1% to 100% of the total active time (per user)
               and call this setting Scaled Time.
                   As a second approach, we calculate features considering the first 7 days after
               a users’ first interaction. We name this setting Days. To overcome the class im-
               balance problem [6], we adjust the class distribution of our MOOCs by randomly
               oversampling the smaller class. Randomly picked examples from the smaller class
               are added until Completers and Dropouts have the same number of samples in
               our dataset. Each of the different approaches builds the foundation for a classifi-
               cation experiment that we run using Boosted Decision Trees [4]. This ensemble
               classifier combines a set of single decision tree into a single classifier. For each
               model, the misclassified examples get a higher weight, so the next decision tree
               focuses more on correctly predicting these.
                   We run classification experiments with Boosted Decision Trees for both ap-
               proaches using two different set of users as input. First, we consider the all En-
               rollments and then only the Active users, as indicated in Table 1. We evaluate
               our experiments using accuracy, which is calculated as the number of correctly
               predicted examples divided by the total number of predicted examples. There-
               fore, this measure can assume values between 0 and 1; a value of 0 indicates
               that all examples have been wrongly classified, while a value of 1 means that all
               examples have been correctly predicted.
                   As a final analysis, we investigate the importance of our features in the
               classification task. Boosted Decision Trees also provide a weight for each used
               feature. Their weight represents the number of times a feature is used to split
               the data across every single decision tree. Thus, the higher the weight, the more
               precise the obtained split. We explore the ranking of the features for both metrics
               when the input consists only of Active users.

               4    Results & Discussion
               Figures 1(a) and 1(b) report the results of the classification for the two ap-
               proaches. The x-axes indicate the days after a user’s first interaction for the
               Days experiment and the percentage of a users’ active time for the Scaled Time
               experiment. For both figures, the y-axes indicate accuracy and are bounded be-
               tween 0.4 and 1. We also plot the baseline as a solid black horizontal line at 0.5.
               The baseline represents the performances of a classifier that randomly predicts
               a class. It is a lower bound; classifiers with an accuracy under the baseline are
               no better than random prediction. Green lines refer to experiments using all
               enrolled users (Enrollments), while the ones in red consider only the active users
               (Active).
                   For the Days experiment reported in Figure 1(b), the accuracy, when only the
               active users are considered, is always increasing the more days are considered.
               It is never lower than 0.7 and increases over 0.8 when all 7 days after users’ first
               interaction are considered. This indicates how the first week of user interactions
               already represent a good indication about which users will eventually drop out.
               The accuracy, when all enrolled users are considered, floats around 0.6. This
               indicates that in this case, the feature set does not characterize the two classes.




Proceedings of the International Conference MOOC-MAKER 2017.                                          16
Antigua Guatemala, Guatemala, November 16-17, 2017.
                                Total Users    Active Users   BASELINE                    Total Users       Active Users       BASELINE
                          1.0                                                   1.0
                          0.9                                                   0.9
                          0.8
                   Accuracy                                                     0.8




                                                                         Accuracy
                          0.7                                                   0.7
                          0.6                                                   0.6
                          0.5                                                   0.5
                          0.4                                                   0.4
                                1 5 10 20 30 40 50 60 70 80 90 100                    1        2      3     4         5      6     7
                                   Percentage of Users' Active Time                           Days from Users' First Interaction
                                         (a) Scaled Time                                                (b) Days

               Fig. 1. Dropout prediction results on Re-Run1. Each subfigure depicts the accuracy
               results for the two proposed approaches. Figure 1(a) reports the results for the Scaled
               Time approach and Figure 1(b) depicts the results for the Days approach, with the x-
               axes indicating the considered percentage of user active time or the considered number
               of days after user first interaction with the MOOC respectively. The y-axes (bounded
               between 0.4 and 1) of each figure indicate the accuracy, with the baseline being plotted
               as a solid black line at 0.5. The green lines refer to the experiments in which we consider
               all users (cf. Enrollments in Table 1), while the ones in red represent experiments in
               which we analyzed only the active users (cf. Active in Table 1). Considering only active
               users always yields the highest accuracy, except when the considered percentage of a
               user’s active time is larger than 80%. In this case, the green and red lines switch places,
               however, the overall performance of the prediction experiments only minimally differs.



               It is likely that a lot of users register for the MOOCs at an early stage, or at
               least more than a week before the material becomes available. At this time, there
               are few possible interactions, and it is likely that users only come back once the
               material is made available. Thus, this approach has a lower overall accuracy.
                   The results of the Scaled Time experiment are plotted in Figure 1(a). Also,
               with this setting, considering only the active users is the approach that produces
               the highest accuracy. Again, the accuracy is never lower than 0.7 and gets as
               high as 0.96 when the whole users’ active time is considered. When all enrolled
               users are taken into account, the accuracy ranges from 0.59 to 0.98, constantly
               increasing as the percentages of users’ active time get higher. Both settings have
               a similar profile when these percentages get higher than 80%.
                   Table 3 lists the best performing features for the two approaches when the
               active users are considered. The first column lists the Tool and the second the
               specific features that belong to it. The remaining columns report the weights for
               the features, first for the Days and then for the Scaled Time experiments. For
               reasons of space, we report only some values for both approaches. Particularly, we
               show day 1, 4 and 7 for the First 7 Days approach and 5%, 50% and 100% of users’
               active time. The weights highlighted bold are the highest for that particular
               experiment. We see that Progress is always one of the features with the highest
               weight for both experiments. Interactions of type Progress refer to users accessing
               a dedicated page to track their scores for single problems and the current overall




Proceedings of the International Conference MOOC-MAKER 2017.                                                                              17
Antigua Guatemala, Guatemala, November 16-17, 2017.
               course grade. Particularly, this page includes a grading chart, which reports the
               obtained scores on each graded assignment in the form of a bar chart. Moreover,
               the page also offers a panoramic overview of the whole set of problems, organized
               per-section and listed in the order they occur in the MOOC. The weight of this
               feature increases with the days and time percentage. If we extract the number
               of interactions of this type for both classes, we obtain a total of 17, 240 for the
               Completers and of 30, 916 for the Dropouts for Re-Run1. For MOOCC1 we have
               121, 228 interactions for the Completers and 54, 768 for the Dropouts. The class
               average yields 82.88 and 80.98 Progress interactions for the Completers, and 6.88
               and 5.40 for the Dropouts of Re-Run1 and MOOCC1 respectively. Besides, if
               for the Dropouts we include also the users with only the enrollment action, the
               averages gets as low as 3.04 for Re-Run1 and 2.68 for MOOCC1.
                  Hence, we find that together with having a high number of correctly solved
               problems, constantly monitoring the personal progress strongly indicates whether
               a user will drop out or not. Similarly, ProblemCheck becomes more significant,
               the more days and higher percentages of interactions are analyzed. This in-


               Table 3. Feature Scores of the Days and Scaled Time Experiments. The features with
               the highest scores for 1, 4, and 7 days after users’ first interaction with the MOOC and
               5%, 50% and 100% of the active time per user are boldfaced. Progress is always among
               the features with the highest scores. Timespan Clicks is among the features with the
               highest scores for the Days experiment, while Session Length is one of the features with
               the highest scores for the Scaled Time experiment. ProblemCheck scores increase the
               more days after users’ first interaction and active time per user we consider.

                                                                       Days             Scaled Time
                       Tool                  Feature              1      4      7     5%    50% 100%
                                       Timespan Clicks         62.3 53.9 57.4        63.1 28.6     10.7
                                         Active Time           44.6 39.3 32.9        40.6 25.2     16.4
                 Session Related        Session Length         36.3 38.2 28.2        59.5 39.3     56.1
                                      Requests Active Day      24.6 18.1 17.0        21.8 29.7     28.8
                                       Session Requests        34.0 36.0 35.8        23.5 36.0     22.7
                                         ProblemCheck          21.4   32.2 41.0      28.6 47.5     59.8
                     Problem             ProblemGraded          3.1    7.6  4.0       1.1 13.3     42.7
                                          ProblemShow          18.0   16.8 20.2      18.4 11.6     35.8
                                             Home              28.0 28.8 22.6        28.9 16.5     12.5
                 Main Page Links            Progress           50.6 54.2 54.7        55.1 84.6     93.8
                                         StudyAtCurtin         22.0 18.8 11.0         9.9  2.6      7.7
                                          NextSelected         22.1   13.2    11.2   21.8   29.7   14.6
                Course Navigation
                                          TabSelected          26.9   18.3    17.4   29.2    8.8   22.2
                                          VideoLoaded          24.6   20.3    17.7   22.4   11.4    9.0
                      Video
                                          VideoPlayed          16.6   15.2     9.6   22.4   14.2   11.1




Proceedings of the International Conference MOOC-MAKER 2017.                                              18
Antigua Guatemala, Guatemala, November 16-17, 2017.
               teraction indicates a problem being correctly checked by the system after users
               submitted an answer. The high scores of this feature come as no surprise, as users
               are likely to solve problems only after they study and learn from the course’s
               material, that is, at a later stage during the MOOC.
                   Certain tools and their features never obtain significant weights, such as
               Video Mobile or Forum Discussion. The low weights for Video Mobile indicate
               that users mostly interact with the MOOCs using a desktop machine rather
               than the edX mobile application. Poll & Survey and Bookmark are rarely used
               tools, either due to being poorly advertised or to users not regarding them as
               particularly useful to complete MOOCs. It also appears that interactions within
               the Forum Discussion barely relate to Completers or Dropouts. First, it is pos-
               sible that the course’s structure does not require users to engage with the forum.
               This could be due to unchallenging courses or, more likely, due to the self-paced
               setting of the MOOCs. Users engage at their own pace and confront the same
               challenges at different times. As a consequence, the role of the forum as a real-
               time communication channel and as the first source of help might be limited.
               5    Conclusion & Future Work
               In this work, we experimented with dropout detection on a MOOC re-run offered
               on the edX portal by Curtin University. We train a Boosted Decision Tree classi-
               fier on the initial offering of a MOOC and predict users that will drop out on its
               re-run. Our results indicate that the first week of users’ interactions already pro-
               vide information about whether users will complete the re-run or eventually drop
               out. Furthermore, we evaluate the importance of each of our proposed features
               for the classification. We discover that the frequency users check their progress
               and correctly solve problems within a short period after first interacting with
               a MOOC, are related to users’ probability of completing the MOOC. We also
               note that certain tools are barely used by users and, therefore, do not carry any
               valuable information for the prediction task. Moreover, we find that the benefits
               of social tools, as in our case the discussion forums, appear to be related to the
               way MOOCs are organized (i.e., limited benefits for self-paced MOOCs) and on
               the efforts they require for users to engage with them.
                   Analyses at tool level can represent a valuable next step. Abstracting from
               the particular event that took place, might help to differentiate more precisely
               most and less used tools, and to confirm our current findings. Using different
               approaches that focus on the initial users’ interactions, will help learn more about
               users’ behaviors. Analogously to interaction and click-pattern mining approaches
               from other domains [14–17], we plan on identifying interaction types of users by
               clustering users of MOOCs according to their click- and interaction patterns to
               improve dropout detection.

               6    Acknowledgments
               This work is in part supported by the Graz University of Technology, Curtin
               University, and the MOOC Maker Project (http://www.moocmaker.org/, Refer-
               ence: 561533-EPP-1-2015-1-ES-EPPKA2-CBHE-JP). The authors particularly
               thank Curtin University for providing the analyzed datasets.




Proceedings of the International Conference MOOC-MAKER 2017.                                          19
Antigua Guatemala, Guatemala, November 16-17, 2017.
               References
                1. Amnueypornsakul, B., Bhat, S., Chinprutthiwong, P.: Predicting attrition along
                   the way: the uiuc model (2014)
                2. Clow, D.: Moocs and the funnel of participation. In: Proceedings of the Third
                   International Conference on Learning Analytics and Knowledge. pp. 185–189. ACM
                   (2013)
                3. Coffrin, C., Corrin, L., de Barba, P., Kennedy, G.: Visualizing patterns of stu-
                   dent engagement and performance in moocs. In: Proceedings of 4th international
                   conference on learning analytics and knowledge. pp. 83–92. ACM (2014)
                4. Friedman, J.H.: Greedy function approximation: a gradient boosting machine. An-
                   nals of statistics pp. 1189–1232 (2001)
                5. Guetl, C., Chang, V., Hernández Rizzardini, R., Morales, M.: Must we be concerned
                   with the massive drop-outs in mooc? an attrition analysis of open courses. In:
                   Proceedings of the International Conference Interactive Collaborative Learning,
                   ICL2014 (2014)
                6. Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In:
                   Natural Computation, 2008. ICNC’08. Fourth International Conference on. vol. 4,
                   pp. 192–201. IEEE (2008)
                7. Jordan, K.: Initial trends in enrolment and completion of massive open online
                   courses. The International Review of Research in Open and Distributed Learning
                   15(1) (2014)
                8. Kloft, M., Stiehler, F., Zheng, Z., Pinkwart, N.: Predicting mooc dropout over
                   weeks using machine learning methods (2014)
                9. McAuley, A., Stewart, B., Siemens, G., Cormier, D.: The mooc model for digital
                   practice (2010)
               10. Rodriguez, C.O.: Moocs and the ai-stanford like courses: Two successful and dis-
                   tinct course formats for massive open online courses. European Journal of Open,
                   Distance and E-Learning 15(2) (2012)
               11. Teusner, R., Richly, K., Staubitz, T., Renz, J.: Enhancing content between itera-
                   tions of a mooc–effects on key metrics (2015)
               12. Vitiello, M., Walk, S., Chang, V., Hernández, R., Helic, D., Guetl, C.: MOOC
                   dropouts: A multi-system classifier. In: 12th European Conference on Technology
                   Enhanced Learning, EC-TEL 2017, Tallinn, Estonia. pp. 300–314 (2017)
               13. Vitiello, M., Walk, S., Hernández, R., Helic, D., Gütl, C.: Classifying students to
                   improve mooc dropout rates pp. 501–508 (2016)
               14. Walk, S., Espı́n-Noboa, L., Helic, D., Strohmaier, M., Musen, M.A.: How Users
                   Explore Ontologies on the Web: A Study of NCBO’s BioPortal Usage Logs pp.
                   775–784 (2017)
               15. Walk, S., Singer, P., Espı́n-Noboa, L., Tudorache, T., Musen, M.A., Strohmaier, M.:
                   Understanding how users edit ontologies: Comparing hypotheses about four real-
                   world projects. In: 14th International Semantic Web Conference, USA, October,
                   2015. pp. 551–568 (2015)
               16. Walk, S., Singer, P., Strohmaier, M.: Sequential action patterns in collabora-
                   tive ontology-engineering projects: A case-study in the biomedical domain. In:
                   23rd ACM International Conference on Information and Knowledge Management,
                   CIKM, Shanghai, China, 2014. pp. 1349–1358 (2014)
               17. Walk, S., Singer, P., Strohmaier, M., Tudorache, T., Musen, M.A., Noy, N.F.: Dis-
                   covering beaten paths in collaborative ontology-engineering projects using markov
                   chains. Journal of Biomedical Informatics 51, 254–271 (2014)




Proceedings of the International Conference MOOC-MAKER 2017.                                               20
Antigua Guatemala, Guatemala, November 16-17, 2017.