<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Improving PhD Student Journeys with Process Mining: Insights from a Higher Education Institution</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>r J. J. L</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>T. Wynn</string-name>
          <email>m.wynn@qut.edu.au</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Arthur H. M. t</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Janne Barnes</string-name>
          <email>janne.barnes@live.com.au</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Business Change Manager, Queensland University of Technology</institution>
          ,
          <addr-line>Brisbane</addr-line>
          ,
          <country country="AU">Australia</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>School of Information Systems, Queensland University of Technology</institution>
          ,
          <addr-line>Brisbane</addr-line>
          ,
          <country country="AU">Australia</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The socioeconomic consequences of not successfully completing PhD studies have motivated universities to expend dedicated efforts on improving student journeys. These journeys leave traces in a variety of university IT systems and this trace data can be exploited to derive insights through the application of process mining. Process mining is a form of data-driven process analytics, where process data, collated from different IT systems, is analysed to uncover the real behaviour and performance of processes. Despite its potential application, process mining hitherto has not been applied to visualise, analyse, and improve PhD student journeys, to the best of our knowledge. This paper reports on the findings of a process mining case study conducted at an Australian University that had espoused a digital transformation initiative to improve PhD student journeys. The case study utilised interactive and comparative process mining techniques and focused on clarifying the way a PhD student journey eventuates, visualising the differences between the real (actual) and prescribed (recommended) processes, comparing the performance of different cohorts, identifying root causes for adverse outcomes, and providing evidence-based recommendations for the digital transformation initiative. The findings from this study resulted in restructuring of HDR services and the introduction of a new research management system.</p>
      </abstract>
      <kwd-group>
        <kwd>Process Mining</kwd>
        <kwd>Digital Transformation</kwd>
        <kwd>Higher Education</kwd>
        <kwd>Process Improvement</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Technology is transformative. Throughout the world, higher education is undergoing
digital transformation [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ], which is a result of increasing competition among
universities and affordance of digital technologies [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. Queensland University of Technology
(QUT), situated in Australia, undertook one such digital transformation project that
aimed to restructure the university’s Higher Degree Research (HDR) services and
replace core research management systems. To do so, an objective was to understand and
improve PhD (a HDR degree) student journeys, i.e., the various steps PhD students
take throughout their degrees as reflected in their interactions with university systems.
      </p>
      <p>Analysing PhD journeys and associated processes of QUT was expected to contribute
to better design of the new research management system and timely completion of PhD.</p>
      <p>
        Completion rates of PhD programs at universities have been a long-standing
concern for national governments [
        <xref ref-type="bibr" rid="ref22 ref5 ref8">5,22,8</xref>
        ] and a pressing issue for higher education. While
there are many facets to the complex problem of timely PhD completions the focus in
this paper is PhD student journeys. A student journey can be seen as a process as it
involves a number of well-defined steps with inter-dependencies and documents which
are needed as input or are produced by these steps.
      </p>
      <p>
        Taking this process lens allows one to leverage methods and techniques from the
discipline of process mining. Process mining is a specialised form of data-driven
process analytics, which enables the extraction of detailed insights regarding process
behaviour, process performance, conformance of processes to existing process models,
and process improvement opportunities from event logs [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Process mining can thus be
used to understand complex unstructured PhD journeys and assist in identifying
unnecessary variations, reasons for delay, and complexity drivers. A process-oriented view
also helps to transcend the isolated view of data collections that dominates traditional
data mining techniques [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] prevalent in higher education.
      </p>
      <p>
        Some initial applications of process mining in higher education can be found in
literature [
        <xref ref-type="bibr" rid="ref4 ref6">4,6</xref>
        ]. While these applications show the potential of process mining, the full
breadth of process mining techniques, such as, the interactive discovery and
conformance checking of process models, comparative process mining, and the identification
of root causes has not been explored in depth.
      </p>
      <p>In this paper, we presents the insights from a process mining case study conducted
at QUT over a period of six months3. The case study shows (1) how a range of process
mining techniques can be applied to better understand PhD student journeys and (2)
how this understanding can be translated into concrete recommendations for process
improvement and changes to systems at a higher education institution .
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related work</title>
      <p>Universities today rely heavily on IT systems to support their processes. These systems
record huge amounts of data which can be studied to reveal valuable information to
improve these processes. To this end, various data-driven analysis approaches have been
applied. These can be subdivided into data mining and process mining techniques.</p>
      <p>
        Several data-mining techniques have been used to address questions similar to those
addressed in this paper. Regression has been used to study the influence of variables
on one another [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], for instance, to study the relation between the amount of leave
taken on time taken to complete a degree. Such techniques are able to identify
highlevel correlations between context factors and outcome variables, however, they are
not able to provide views of the process/journey that led to these outcomes. Sequential
pattern mining studies the occurrence of events in a sequence in order to find statistically
relevant patterns and to predict student outcomes [
        <xref ref-type="bibr" rid="ref14 ref15">15,14</xref>
        ]. While such techniques can
analyse sequences of events in a student journey, they do not provide complete
endto-end process views and abstract away choices and parallelism. Consequently, such
3 Janne Barnes was the business change manager at QUT during this period.
techniques are unable to identify deviations, loops, and infrequent behaviour which is
relevant to answer some process-related questions [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>
        Process mining has been applied in the context of online learning environments
such as Massive Open Online Courses (MOOCs). For example, [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] analysed log data of
student interactions with online course materials, clustered cases to derive accurate
process models, and compare different cohorts. [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] used process mining to derive student
learning workflows from virtual learning environment logs with decision trees
capturing the rules that control students’ adaptive learning. [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] proposed a process mining
approach that can predict student dropouts in MOOCs at an early stage.
      </p>
      <p>
        Furthermore, process mining has been employed to mine curricula providing
insights for university administrators and course coordinators about the different units a
student may choose throughout their course ([
        <xref ref-type="bibr" rid="ref21">21</xref>
        ], [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]). [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] developed a foundational
learning analytics model for higher education, which they claim can be used to provide
personalised learning and support services to students but to the best of our knowledge
has not been applied to improve student journeys.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Introduction to the Case Study</title>
      <p>In Australia, timely completion is reinforced through PhD scholarship rules and the
Australian Quality Framework (AQF) that explicitly state the term of PhD studies as 3-4
years. The PhD program at QUT is expected to be completed within a three-year period.
During these three years, students need to complete three major milestones at which the
scope, plan and progress of the student is scrutinised: Stage 2, the Confirmation seminar,
and the Final seminar. A student may withdraw from the degree at any point which
terminates the PhD journey. Independent of the three milestones, the student may take
periods of leave, apply for extensions, complete annual progress reports, which are all
done using online e-forms. These e-forms assist with monitoring the progress of a PhD
student thus contributing to timely completion of student journeys. The varied number
of activities involved, inherent uncertainties of research, and the multi-year duration
make the PhD journey a complex process. As a result, the PhD journey is a much less
unstructured process although the milestones are clearly defined.</p>
      <p>The digital transformation program aimed to improve student supervisor experience
by restructuring HDR services around these experiences. Another change imperative
was the centralisation and standardisation of services for HDR management to create
economies of scale. The project was primarily a process improvement initiative at the
enterprise level, with the overarching goal of more efficient research management and
enriched PhD journey experiences.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Actions Taken</title>
      <p>Our process mining study consisted of four phases: identification of analysis questions,
data extraction and pre-processing, process mining analysis, and interpretation of results
and provision of recommendations.</p>
      <p>Identification of analysis questions. The following six questions were identified:</p>
      <p>AQ1 Assess the quality of time/event data for undertaking process mining analysis of a</p>
      <p>PhD student journey;
AQ2 Discover the different behaviours exhibited by students to better understand their
journey;
AQ3 Compare the actual and prescribed process of the PhD student journey to identify
deviations;
AQ4 Conduct a detailed performance analysis of the PhD student journey, particularly
in regards to completion time;
AQ5 Identify root causes and patterns in the process models which could be used as risk
predictors for withdrawal from PhD programs;
AQ6 Identify ways to improve the processing of e-forms to facilitate the student journey.</p>
      <p>Data extraction and pre-processing. A student journey dataset which covered all
the PhD students enrolled in the past 16 years (2002 to 2018) was analysed. In addition,
seven data sets each relating to one of the seven e-forms covering a period of 2016 to
2018 were analysed. Both datasets were complemented by a data set with demographic
factors such as: faculty (eight faculties in total), type of study (full-time, part-time),
mode of study (domestic, international), gender (male, female), and degree (various
HDR degrees). Datasets had to be merged to obtain logs appropriate for analysis.
Following this, a significant amount of effort was spent on data cleaning and pre-processing
(AQ1). The logs were cleaned to ensure consistency of activity labels across datasets.
Events without complete timestamps were filtered as they were marked as active cases
and were not desired to be included in the analysis. In discussion with stakeholders,
logs were filtered for full-time students who completed their PhD (1139 cases with
13056 events) and who withdrew (498 cases with 7558 events). Furthermore, sublogs
for each milestone (Stage 2, Confirmation, and Final Seminar) were created to conduct
an in-depth analysis.</p>
      <p>
        Process mining analysis. Once the data sets were retrieved, cleaned and pre-processed,
they were subjected to analysis. The main tools we used for analysis were: multiple
ProM plug-ins (in particular the Inductive visual Miner (IvM) [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] and
ProcessProfiler3D [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ]), DISCO 4, and SQL Server. IvM was used to automatically discover and
manually build process models. ProcessProfiler3D was used to visualise and compare
the performance of different cohorts. DISCO was used for its elaborate filtering
features, and SQL server was used for cleaning data as well as aggregate calculations.
      </p>
      <p>Interpretation of results and provision of recommendations. During our
analysis, we frequently sought feedback on intermediate results from the stakeholders, rather
than sharing the results at the end. In addition to regular informal meetings, we
presented three times to the stakeholders. This way we gained a deeper understanding of
the intricacies of the domain and were able to adjust our next steps to increase the
accuracy of our analysis. This was essential given the complexity of the PhD journeys
and the data quality issues we encountered. Close collaboration with stakeholders also
assisted with live conformance checking. For instance, we manually built the models of
some processes with the assistance of the IvM in collaboration with the stakeholders to
visualise deviations from prescribed models, rules, and regulations.
4 https://fluxicon.com/disco/</p>
    </sec>
    <sec id="sec-5">
      <title>Results Achieved</title>
      <sec id="sec-5-1">
        <title>Discovery of the PhD Student Journey (AQ2)</title>
        <p>
          Given the complexity of the PhD journey, the stakeholders were interested in visualising
the key activities undertaken in the PhD journey. The stakeholders were also interested
to know about the distribution of leave in the PhD journey as leave has been identified
as a potential risk factor for delays in PhD completion [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ].
        </p>
        <p>To discover process models, an automated process discovery technique, IvM, was
used. However, automatic process discovery algorithms tend to overgeneralise the
presence of recurring activities, i.e., show excessive behaviour that does not reveal much
about the actual behaviour of processes. For example, the automatically discovered
model showed leave (a recurring activity) in parallel to a milestone with a frequency
of 250. The frequency indicates that the total number of leaves taken are 250, however,
it gives little information about when the leaves were taken in the journey, which the
stakeholders were interested in. Therefore, we manually adapted the automatically
derived models to better fit the needs of the stakeholders. E.g., we adapted the process
model to display activities for the different types of leave taken just before and after
major milestones. Figure 1 shows this adaption for the final seminar where different
types of leave such as maternity, sick and other approved leave are taken prior to the
final seminar. The adapted models showed key events, such as students being placed
under review, students resubmitting documents, and students taking leave, and how often
these occurred. The models also displayed when in the journey these events happened as
well as their occurrence frequency. The resulting models also revealed both, frequency
and duration of various types of leave.
5.2</p>
      </sec>
      <sec id="sec-5-2">
        <title>Deviations From Expected Student Journey (AQ3)</title>
        <p>The stakeholders wished to find out whether Annual Progress Reports (APR) were only
submitted after Confirmation. To this end, we constructed two process models to see if
there were students completing an APR before Stage 2 or between Stage 2 and
Confirmation. Interestingly, we found that 308 students submitted APRs before Stage 2,
which surprised the stakeholders and provided an example of unnecessary utilisation of
resources. This discovery resulted in revision of the APR workflow, with the first APR
request initiated six months after confirmation.</p>
        <p>In addition to constructing models, we also visualised deviations from frequent
paths observed in the log using Inductive visual Miner. Figure 1 shows an example
of such deviations (in red). It was interesting to discover that eight students skipped
Final Seminar yet completed their PhD. This visualisation also helped to identify
infrequent behaviour, which assisted with understanding the different trajectories PhD
students have taken in the past.
5.3</p>
      </sec>
      <sec id="sec-5-3">
        <title>Performance analysis of the PhD student journey (AQ4)</title>
        <p>PhD completion times are a concern for both individuals and universities due to
significant economic and psychological costs. At present, QUT expects a PhD journey to be
completed in three years. However, during their PhD journey a student may take periods
of leave which extend the duration of the program.</p>
        <p>We found that only 4.65% of the students complete their PhD within three years.
55% of the students completed within four years and 82% complete within four and
a half years. These results surprised the stakeholders as they did not expect such a
small percentage of the students to complete within three years. Longer completion
times cause additional costs to the university due to higher resource utilisation. Further
discussions with the stakeholders revealed that they considered durations from
enrolment to lodgement of thesis, while in fact the actual duration of a PhD journey is better
viewed as starting at enrolment and ending at submission of thesis after addressing
external examiners comments as students still use university resources such as printer,
desk, laptop etc. after thesis lodgement. This insight was met with a positive response
from the stakeholders and resulted in a new definition of the duration of a PhD journey
as well as calculation of the associated costs.</p>
        <p>We also explored variations in performance of the students across demographic
factors present in the log, such as gender, faculty, and more. We used ProcessProfiler3D for
comparative process mining, to showcase multiple cohorts for each factor (e.g., different
faculties) in one view. Figure 2 shows a high-level process model in ProcessProfiler3D
showing performance measures of student cohorts grouped by faculty. We found no
variations in performance for PhD students across the demographic factors, an
interesting finding in itself for the stakeholders, as it implies that no special attention needs to
be paid to a particular cohort and no cohort-specific reforms are required.
5.4</p>
      </sec>
      <sec id="sec-5-4">
        <title>The Journey of Students that withdrew from the degree (AQ5)</title>
        <p>Another question of interest was when students withdraw during the journey and why,
i.e., what the factors are that influence the students’ withdrawal from the course. To
answer this question, we discovered a process model for each phase of the PhD journey
(each corresponding to one of the three milestones).</p>
        <p>
          We found that 35 out of 280 students withdrew just after starting the course. Another
31 students took some type of leave and then withdrew before Stage 2. Out of 280
students, only 214 submitted their Stage 2 proposal. This finding was interesting to the
stakeholders and raised the question why some students decided to withdraw so soon
after enrolment without even attempting to achieve their first milestone. Though the
actual reason cannot be ascertained from the model, one can think of multiple causes
for this attrition, e.g. student-supervisor relationship, financial constraints, health [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ].
        </p>
        <p>
          Our observations also indicate that students who withdraw take more time to
finalise their confirmation milestone than students who successfully complete the degree.
Furthermore, we found that 50% of the students who withdraw after completing their
confirmation withdraw within the next year. This insight highlights the need to ensure
ample support and supervision in the first year of a student’s PhD [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. For the other
milestones we did not observe major differences between the two student cohorts.
        </p>
        <p>The stakeholders were also interested in a more in-depth analysis of duration and
frequency of periods of leave before and after the three major milestones. Six process
models were built, one for each milestone-cohort combination. Analysis of all six
models shows that students who eventually withdraw from the program take leave more
often and for longer periods of time during the journeys to Stage 2 and also from Stage 2
to Confirmation. Figure 3 demonstrates the difference in the frequency and duration
of different leave types between completed and withdrawn students during the journey
from Stage 2 until Confirmation.</p>
        <p>This pattern observed for the withdrawn students was considered to be an early risk
indicator of potential withdrawal. Acknowledging this as an early risk indicator, QUT
decided to initiate bi-annual health-check emails to ensure that PhD students have an
opportunity to report on any health issues, which can be brought to the attention of the
university as well as to the supervisors.</p>
      </sec>
      <sec id="sec-5-5">
        <title>5.5 Improvement in the processing of e-forms (AQ6)</title>
        <p>E-forms are the main mechanism for students to interact with the university processes
and are, therefore, an integral component of the PhD journey. Another question raised
by the stakeholders was how the processing of e-forms could be improved (AQ6). To
address this question, we checked for bottlenecks in the processing of e-forms, again
using process models enhanced with performance information. We found that among
all performed activities, the activity ‘RSC approval’ and the activities residing with the
Research Student Centre (RSC) take the largest amount of time as is shown by the
process model associated with the ‘Student Leave’ e-form. On sharing the median and
average duration of processing e-forms with the stakeholders, it was confirmed that the
(a) Average duration
(b) Relative frequency (days)
time taken is much more than what they expected (note that this was not a given as
certain activities may by their very nature take longer). We repeated the process for
the remaining e-forms (seven e-forms in total) and found the same bottlenecks for all
of them. In addition to this, we found that the students take considerable time filling
in the e-forms. These findings resulted in changes in the e-form workflows (e.g.
prefilling them automatically as much as possible) and a restructuring of the way the RSC
processes these forms.</p>
        <p>In terms of the overall time taken to process e-forms, i.e., the case duration, it was
found that some e-forms completed within an expected time frame, while others took
exceptionally long to complete. Consequently, the stakeholders were interested in
identifying the reasons for these long delays. To answer this question, we started by
discovering process models for each e-form. We found that all process models contain loops,
which indicates the presence of rework in the processing of the e-forms.</p>
        <p>To investigate these loops further, we abstracted out the rework loops in the process
into sub-processes using hierarchical process models. The top level hierarchy showed
the key activities of the loop and the bottom level was a simplistic version of the
original process model. This enabled us to focus on those parts of the model that concerned
rework and hence retrieve performance measurements for these loops. To get further
information, we created and added an extra attribute to our event log, which indicated
whether a particular e-form instance was long running or not. An e-form instance was
considered to be long-running if it lasted at least 2.5 times the standard deviation longer
than the mean duration of processing of e-forms. The resulting log and hierarchical
process model were used as input for ProcessProfiler3D. Unsurprisingly, the visual
representation brings out the striking difference between cohort of long-running cases versus
the cohort of cases with an expected duration (see Figure 4). The performance
visualisation shows that normal cases take much less time for certain activities than the long
running cases.</p>
        <p>To further understand why in some instances the processing of e-forms takes an
exceptionally long time, we used the ‘trace visualisation’ feature of ProcessProfiler3D.
This feature enables the user to visualise the trajectory of process instances through the
process model. This time we divided the cases into five cohorts (where four cohorts
correspond to the first, second, third, and fourth quartile respectively, and the fifth
cohort correspond to the long-running outlier cases) by adding another attribute to the
event log. The resulting visualisation, conveyed that long-running cases involve
multiple loops resulting in substantially more rework than other cases.</p>
        <p>
          Once activity duration and multiple loops were identified as underlying reasons of
delay, we were also interested in investigating if any of the demographic factors were
associated with long running cases. We used relevant data mining techniques (contrast
set learning [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] and decision tree mining [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ]) to uncover the potential influence of
the attributes on the duration of cases. We found that none of the demographic factors
(faculty, gender, scholarship holder, type of study) had an impact on the processing time
of e-forms. This was insightful for stakeholders.
        </p>
        <p>To identify more potential improvements in the processing of e-forms, the
stakeholders also wanted an analysis of variations in the processing of e-forms across six
faculties of interest. To address this, we filtered the event log by faculty (using DISCO)
and collated data for each faculty, as shown in Figure 1. It is evident that Faculty 3
takes more time for ‘faculty feedback’ than other faculties. This surprised stakeholders
as they assumed Faculty 3 to have the fastest processing times. Based on this and other
similar findings, a case for standardisation of processing of e-forms across faculties
was proposed.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Benefits and Lessons Learned</title>
      <p>Application of process mining techniques brought forth numerous benefits for the
digital transformation program at QUT. Specifically, the following benefits were achieved:
1. Better visibility of the end-to-end PhD journey and identification of task
dependencies in this journey: Visualising the end-to-end journey of completed and
withdrawn PhD students was considered insightful by the stakeholders. This analysis
gave them a better understanding of the task dependencies as well as the different
behaviours exhibited by students in the past.
2. Enhanced visualisation of deviations from the expected student journey: We
observed deviations from typical student journeys and these observations resulted in
introduction of changes to the PhD journey. New decision points were added in the
workflow of APR and Stage 2 milestone e-form.
3. Clear identification of patterns in the journey of withdrawn students as early risk
indicators in the PhD journey: Our analysis showed that most students withdrew in
the first year of their PhD program. Additionally, the data points towards a pattern
of increased frequency and duration of periods of leave for students that withdrew.
To reduce attrition, QUT has decided to implement automated health check forms
every 6 months as a means to monitor the well-being of students at an early stage.
4. Data-driven insights for process improvement for e-forms: We found that the time
taken by the RSC to process e-forms and the duration of completion of these
eforms by students are performance bottlenecks. The data also revealed re-work
taking a considerable time in the processing of e-forms. This finding resulted in
a reformulation of task handover rules in the RSC. Furthermore, to reduce form
completion times by students, pre-population of forms using information already
available in the database was introduced.
5. Better evidence for standardisation of processing of e-forms across faculties: We
found variations in e-form processing performance across faculties. This insight
assisted in making a case for standardisation e-forms processing across faculties,
which was approved by the relevant authority.</p>
      <p>Insights from this case study had supported the introduction of a new research
management system at QUT. According to the manager of the project, “ this project allowed
for the review of policies to support the student journey and has underpinned
innovative thinking in how to redesign processes and forms for students.” Furthermore, the
approach presented here can be replicated by other universities enabling them to use
process mining techniques in addition to other methods to improve PhD student
journeys. Here, we summarise the lessons learned from this study:
1. The need for interactive process mining: The study brings forth the gravity of
continual interaction with stakeholders to uncover relevant process models. This is
also useful in scenarios where a standard or normative process is not documented.
It prevents ruling out behaviours without any underlying reasons. Similarly, ‘live’
conformance checking can enrich the analysis with domain knowledge and assist
in obtaining accurate insights. Additionally, the questions asked during such
interactions can also assist stakeholders in assessing the correctness of existing policies
and also point to them, if not documented.
2. Significance of comparative process mining: Universities usually have a certain
degree of autonomy compared with other organisations, which is why variants of
processes may be observed. Hence, analysing cohorts of interest in order to further
standardise the process can contribute to better performance. Once logs of cohorts
are obtained, process models can be discovered, performance measures calculated,
and then compared. Comparative analysis enables the identification of variants and
root causes of performance differences among cohorts. These findings can in turn
be used by decision makers to provide targeted support.
3. Significance of data-driven evidence to validate hypothesis: The data-driven
evidence provided by process mining analysis allowed stakeholders to validate, or
disprove hypothesis about student journey. As mentioned by a stakeholder,
“[Process mining] approach allowed us to have a conversation about the student journey
which were not based in beliefs, but data, often in process improvement initiatives
the project doesn’t have any authority on the subject, as the business areas have a
better understanding of the process. Having data allowed the project to challenge
conventional beliefs.”
7</p>
    </sec>
    <sec id="sec-7">
      <title>Conclusion</title>
      <p>The findings of this paper demonstrate the significance of application of process
mining techniques in higher education. They also convey how process mining techniques
and insights can be used to support a digital transformation initiative, in this case an
overhaul of traditional research management systems and associated processes. Some
of our findings are also reflected in existing literature, reinforcing the validity of the
application of process mining in addressing the analysis questions presented in this study
and providing further empirical evidence to the field of higher education studies.
Furthermore, the approach presented here can be replicated by other universities enabling
them to use process mining techniques to improve PhD student journeys. The work
presented in this paper is limited to descriptive analysis, where future work can incorporate
more advanced capabilities of predictive analysis. Techniques to systematically
translate findings to improvements are also recommended. The work presented in this paper
brings forth opportunities for future research, notably the conduct of process mining
case studies in other higher education universities in Australia as well as internationally
to improve PhD student journeys.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>van der Aalst</surname>
            ,
            <given-names>W.M.P.</given-names>
          </string-name>
          : Process Mining: Data Science in Action. Springer, Heidelberg (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>van der Aalst</surname>
            ,
            <given-names>W.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guo</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gorissen</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Comparative process mining in education: An approach based on process cubes</article-title>
          .
          <source>In: International Symposium on Data-Driven Process Discovery and Analysis</source>
          . pp.
          <fpage>110</fpage>
          -
          <lpage>134</lpage>
          . Springer (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Agné</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mörkenstam</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          :
          <article-title>Should first-year doctoral students be supervised collectively or individually? effects on thesis completion and time to completion</article-title>
          .
          <source>Higher Education Research &amp; Development</source>
          <volume>37</volume>
          (
          <issue>4</issue>
          ),
          <fpage>669</fpage>
          -
          <lpage>682</lpage>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Bogarín</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cerezo</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Romero</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>A survey on educational process mining</article-title>
          .
          <source>Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery</source>
          <volume>8</volume>
          (
          <issue>1</issue>
          ),
          <year>e1230</year>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Caparrós-Ruiz</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Time to the doctorate and research career: some evidence from spain</article-title>
          .
          <source>Research in Higher Education</source>
          <volume>60</volume>
          (
          <issue>1</issue>
          ),
          <fpage>111</fpage>
          -
          <lpage>133</lpage>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Cerezo</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bogarín</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Esteban</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Romero</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Process mining for self-regulated learning assessment in e-learning</article-title>
          .
          <source>Journal of Computing in Higher Education</source>
          <volume>32</volume>
          (
          <issue>1</issue>
          ),
          <fpage>74</fpage>
          -
          <lpage>88</lpage>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>De Freitas</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , et al.:
          <article-title>Foundations of dynamic learning analytics: Using university student data to increase retention</article-title>
          .
          <source>British Journal of Educational Technology</source>
          <volume>46</volume>
          (
          <issue>6</issue>
          ),
          <fpage>1175</fpage>
          -
          <lpage>1188</lpage>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Geven</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Skopek</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Triventi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>How to increase phd completion rates? an impact evaluation of two reforms in a selective graduate school,</article-title>
          <year>1976</year>
          -
          <fpage>2012</fpage>
          . Research in higher education
          <volume>59</volume>
          (
          <issue>5</issue>
          ),
          <fpage>529</fpage>
          -
          <lpage>552</lpage>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Hu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Treatment learning: Implementation and application</article-title>
          .
          <source>Ph.D. thesis</source>
          , University of British Columbia (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Hunter</surname>
            ,
            <given-names>K.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Devine</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Doctoral students' emotional exhaustion and intentions to leave academia</article-title>
          .
          <source>International Journal of Doctoral Studies</source>
          <volume>11</volume>
          (
          <issue>2</issue>
          ),
          <fpage>35</fpage>
          -
          <lpage>61</lpage>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Koenker</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bassett</surname>
            <given-names>Jr</given-names>
          </string-name>
          , G.:
          <article-title>Regression quantiles</article-title>
          .
          <source>Econometrica: journal of the Econometric</source>
          Society pp.
          <fpage>33</fpage>
          -
          <lpage>50</lpage>
          (
          <year>1978</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Leemans</surname>
            ,
            <given-names>S.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fahland</surname>
          </string-name>
          , D., van der Aalst, W.M.:
          <article-title>Exploring processes and deviations</article-title>
          .
          <source>In: International Conference on Business Process Management</source>
          . pp.
          <fpage>304</fpage>
          -
          <lpage>316</lpage>
          . Springer (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Littlejohn</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hood</surname>
          </string-name>
          , N.:
          <article-title>Reconceptualising learning in the digital age: The [un] democratising potential of MOOCs</article-title>
          . Springer (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Perera</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kay</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Koprinska</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yacef</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zaïane</surname>
            ,
            <given-names>O.R.</given-names>
          </string-name>
          :
          <article-title>Clustering and sequential pattern mining of online collaborative learning data</article-title>
          .
          <source>IEEE Transactions on Knowledge and Data Engineering</source>
          <volume>21</volume>
          (
          <issue>6</issue>
          ),
          <fpage>759</fpage>
          -
          <lpage>772</lpage>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Poon</surname>
            ,
            <given-names>L.K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kong</surname>
            ,
            <given-names>S.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wong</surname>
            ,
            <given-names>M.Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yau</surname>
            ,
            <given-names>T.S.:</given-names>
          </string-name>
          <article-title>Mining sequential patterns of students' access on learning management system</article-title>
          .
          <source>In: International conference on data mining and big data</source>
          . pp.
          <fpage>191</fpage>
          -
          <lpage>198</lpage>
          . Springer (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Rokach</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Maimon</surname>
            ,
            <given-names>O.Z.</given-names>
          </string-name>
          :
          <article-title>Data mining with decision trees: theory and applications</article-title>
          , vol.
          <volume>69</volume>
          . World scientific (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17. van de Schoot, R.,
          <string-name>
            <surname>Yerkes</surname>
            ,
            <given-names>M.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mouw</surname>
            ,
            <given-names>J.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sonneveld</surname>
          </string-name>
          , H.:
          <article-title>What took them so long? explaining phd delays among doctoral candidates</article-title>
          .
          <source>PloS one 8</source>
          (
          <issue>7</issue>
          ),
          <year>e68839</year>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Schulte</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , Fernandez de Mendonca,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Martinez-Maldonado</surname>
          </string-name>
          ,
          <string-name>
            <surname>R.</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Buckingham</given-names>
            <surname>Shum</surname>
          </string-name>
          ,
          <string-name>
            <surname>S.</surname>
          </string-name>
          :
          <article-title>Large scale predictive process mining and analytics of university degree course data</article-title>
          .
          <source>In: International Learning Analytics &amp; Knowledge Conference</source>
          . pp.
          <fpage>538</fpage>
          -
          <lpage>539</lpage>
          . ACM (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Umer</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Susnjak</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mathrani</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Suriadi</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>On predicting academic performance with process mining in learning analytics</article-title>
          .
          <source>Journal of Research in Innovative Teaching &amp; Learning</source>
          <volume>10</volume>
          (
          <issue>2</issue>
          ),
          <fpage>160</fpage>
          -
          <lpage>176</lpage>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Vidal</surname>
            ,
            <given-names>J.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vázquez-Barreiros</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lama</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mucientes</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Recompiling learning processes from event logs</article-title>
          .
          <source>Knowledge-Based Systems 100</source>
          ,
          <fpage>160</fpage>
          -
          <lpage>174</lpage>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zaïane</surname>
            ,
            <given-names>O.R.</given-names>
          </string-name>
          :
          <article-title>Discovering process in curriculum data to provide recommendation</article-title>
          .
          <source>In: EDM</source>
          . pp.
          <fpage>580</fpage>
          -
          <lpage>581</lpage>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Winchester-Seeto</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Homewood</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thogersen</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jacenyik-Trawoger</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manathunga</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Reid</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Holbrook</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Doctoral supervision in a cross-cultural context: Issues affecting supervisors and candidates</article-title>
          .
          <source>Higher Education Research &amp; Development</source>
          <volume>33</volume>
          (
          <issue>3</issue>
          ),
          <fpage>610</fpage>
          -
          <lpage>626</lpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Wynn</surname>
            ,
            <given-names>M.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Poppe</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , ter
          <string-name>
            <surname>Hofstede</surname>
            ,
            <given-names>A.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brown</surname>
          </string-name>
          , R.,
          <string-name>
            <surname>Pini</surname>
          </string-name>
          , A.,
          <string-name>
            <surname>van der Aalst</surname>
          </string-name>
          , W.:
          <article-title>ProcessProfiler3D: A visualisation framework for log-based process performance comparison</article-title>
          .
          <source>Decision Support Systems</source>
          <volume>100</volume>
          ,
          <fpage>93</fpage>
          -
          <lpage>108</lpage>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>Xiao</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>Digital transformation in higher education: critiquing the five-year development plans (2016-2020</article-title>
          )
          <article-title>of 75 chinese universities</article-title>
          .
          <source>Distance Education</source>
          <volume>40</volume>
          (
          <issue>4</issue>
          ),
          <fpage>515</fpage>
          -
          <lpage>533</lpage>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>