<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Mining compilation data to better prepare and assign K-12 coding mentors</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Chinmay Sheth</string-name>
          <email>shethc@mcmaster.ca</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vaitheeka Nallasamy</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kruthiga Karunakaran</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stephanie Li</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yiding Li</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Christopher Kumar Anand</string-name>
          <email>anandc@mcmaster.ca</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>McMaster University</institution>
          ,
          <addr-line>1280 Main St W, Hamilton, ON L8S 4L8</addr-line>
          ,
          <country country="CA">Canada</country>
        </aff>
      </contrib-group>
      <fpage>39</fpage>
      <lpage>50</lpage>
      <abstract>
        <p>Our university outreach program has introduced over 30,000 Grade 4 to 8 students to functional programming in Elm over the last decade. Pre-pandemic, mentors would visit students in-person classrooms to conduct workshops aligned with their curriculum. With the advent of the pandemic, this switched to virtual visits, which allowed us to teach children in remote regions and other countries, greatly increasing our reach. Further increasing our reach will require smarter use of most important resource: coding mentors. In this retrospective study, we looked at statistics for compilations on our web-based integrated development environment in order to identify patterns that could immediately inform mentor training programs, and in the medium term be used to optimize resources by directing mentors to the students most in need of help, even when they have not explicitly requested help via the built-in mentor chat.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;computer science education</kwd>
        <kwd>introduction to programming</kwd>
        <kwd>teacher dashboard</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Learning to Code has been called the Literacy of the 21st Century. Bers [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] asks “What is literacy? It
is the ability to use a symbol system (a programming language or a natural written language) and a
technological tool [..] to comprehend, generate, communicate, and express ideas[...]” Among the reasons
we care about it: “Literacy ensures participation in decision-making processes and civic institutions.
Those who can’t read and write are left out of power structures. Their civic voices are not heard.” At a
time when someone will be making decisions about how software is used and how it therefore impacts
society, it is important that the majority be capable of participating in the debate. Burke et al. [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] amplify
this sentiment: “If a learner reaches adulthood and cannot read or write, it is generally identified as
a collective societal failure. As society is increasingly digitized, students need to be able to read and
understand the information contained in code”.
      </p>
      <p>
        Whereas Bers [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] argues for the importance of coding tools for children as young as kindergarten,
for whom they point to the success of block coding and tangibles; addressing the need to “read and
understand the information contained in code” at an adult level probably requires understanding of
textual code. It is therefore important to make text-based coding easier to learn. In our outreach
programming, which has visited thousands of classrooms and introduced over 30,000 children to coding,
the bottleneck to growth has long been our ability to train mentors to visit classes in-person or virtually.
Anecdotally, we know that primary teachers are very reluctant to introduce text-based coding about
which they have insuficient training, but enough word-of-mouth evidence that even children doing the
simplest things can cause compilers to emit cryptic error messages.
      </p>
      <p>How can we better train undergraduate mentors now, and, in the future, primary teachers? This
paper looks at the data generated by our Web-based Integrated Development Environment (WebIDE),
to see if it ofers useful insights in training future instructors. Our WebIDE is backed up by a server
running the Elm compiler. To allow children to return to previous versions of their code and support
our help system, all submissions are stored on the server every time the compile button is clicked. The
help system consists of a chat pane where children can ask for help. Mentors can edit and compile a
copy of the children’s code to understand the problem and double-check the proposed solution. This
is especially helpful for in-class workshops where screen sharing is not possible. But children do not
always ask for help when they need it. Analyzing data on compilation results could be used in two
ways: (1) to better train mentors for the types of problems children will encounter, (2) provide mentors
with real-time feedback on which students need assistance, and (3) improve resource allocation as an
organization. As a first step toward these goals, we asked two research questions:
RQ1 Are compiler error types correlated with activity type?
RQ2 Does time to resolve compiler errors follow a statistical distribution?</p>
      <p>Another feature of our original WebIDE led to a natural experiment. Namely, students were presented
with a “slot” system for storing their code, referencing the slot systems many game systems use for
storing games. The slots were grouped into 10 per activity type, and the basic activity types mirrored
the initial lessons in our outreach program, so they are a good proxy for student experience.</p>
      <sec id="sec-1-1">
        <title>1.1. Contributions</title>
        <p>
          This paper contributes to STEM education in three ways:
1. Understanding syntax and type errors of beginning programmers will lead to more efective
introductory Computer Science teaching, which is one important Science.
2. The Algebraic Thinking curriculum being used in these code examples was developed to help
children solidify their arithmetic and geometric knowledge and prepare them for high school
algebra, all of which are core Mathematics subjects.
3. Finally, Silver et al. [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] have identified success in high school algebra as the bottleneck to success
in high school and a gateway to STEM education pathways, so making progress here makes all
higher STEM subjects more accessible.
        </p>
        <p>The specific contributions of this paper are the identification of the power law as best fitting the
distribution of time to fix compilation errors, and the observation that error types rapidly change over
time as young students learn to code in Elm. The remaining sections of this paper discuss related work,
the background of our outreach program, methods, results and conclusion.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Related work</title>
      <p>
        There is a long history of applying analytics in education [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Of most interest to computer scientists are
the linked literatures on Open Learner Models and Analytics Dashboards. Open Learner Models (OLMs)
arose out of the attempt to capture some of the performance advantage of tutoring over classroom
teaching by using software tutors [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. The simplest OLMs represent progress through a pre-defined
curriculum with accuracy on quiz questions and problems. In some cases, OLMs plot individual progress
against class averages. Learning Analytics Dashboards (LADs) “make use of data science methods to
analyze data and report the results” [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. There is a lot of overlap between the two. Since LADs are
supposed to improve learning, there is a growing movement to incorporate educational theories into
their design [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>
        Starting with a diferent problem, our approach is following a diferent trajectory. Our approach
could be called “mentor in the loop", in that we are not trying to synthesize a software tutor, but to
increase the efectiveness of the mentors we already employ. Synthesizing software tutors is dificult,
at least for introductory high-school algebra, but has been shown to be efective [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. It is rewarding
and dificult research. On the other hand, using near-peer mentoring in computing education has a
secondary benefit that “youth’s interest in computer science (CS) can be sparked by providing them
with role models who are relatable and who resonate with their identities” [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. This may be true in other
areas, but we know that in software, the rapid growth of the field guarantees that the number of role
models in the average community or family network will be small compared to the learning and career
opportunities. This will be exacerbated in economically disadvantaged communities. So it makes sense
to design systems to support near-peer mentors. Unlike normal teachers, near-peer mentors cannot be
expected to have any knowledge of educational theories (other than the discredited “learning styles”).
Our trained mentors spend from 10 to 100 hours mentoring, and a high turnover is acceptable, because it
is also a learning experience for the mentors. We are very confident that mentoring helps them improve
their communication skills, and we suspect that it also improves self-eficacy and metacognitive skills
in the mentors, but we have not studied this, and it is not needed to justify our program. While we
would like to develop an educational theory or interpret an existing theory in our context, and explain
the student characteristics blind clustering appears to identify, that is a long-term goal. In the near
future, we can measure the efectiveness of our LAD in terms of student output.
      </p>
      <sec id="sec-2-1">
        <title>2.1. Functional programming</title>
        <p>
          Since our outreach program adopted Elm, a functional language for teaching, it is important to mention
some facts about Functional Programming. Functional programming [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] is a value-oriented
programming paradigm, consisting of functions. Functions consume and produce values. There are no loops,
and conditional expressions replace conditional statements, but functions are first-class values and can,
e.g., be passed as parameters. There are two variations in functional programming languages: (1) typed
or not and (2) eager or lazy. These variations lead to diferences in programming style.
        </p>
        <p>Many non-functional programming languages are adopting functional features, including Scala, Swift
and Python.</p>
        <p>
          Krishnamurthi and Fisler [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] agree with the common perception that writing programs in imperative
programming languages is much easier as the state provides convenient communication channels
between parts of a program, but this makes reasoning and debugging harder, whereas on the other hand
functional programming has the opposite afordances. Students studying object-oriented programming
are taught diferent skills and programming styles which reveal that the way of approaching
programming and problem-solving difers in students studying diferent paradigms. Functional programming
students perform better by having high level structures and and composing solutions out of simpler
functions than object-oriented students who try solving the entire problem in a single traversal of data.
They also use built-in/higher order functions to implement subtasks which performed multiple passes
over input data and had to release unwanted memory for intermediate data. Functional programming
students create short functions for specific tasks, which create intermediate data. They also use filter
and map rather than loops and non-general library functions. Thus, we should expect that a student
who learns Java after learning functional programming may well program with diferent patterns than
a student whose prior experience was entirely imperative.
        </p>
        <p>Note that our experience is that functional programming with appropriate supports is easier for
Grade 4 to 8 students. Prior to using Elm, we had developed coding activities using Python, and there
was a dramatic improvement in student focus after we made the change.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Elm language</title>
        <p>
          Elm (https://elm-lang.org) is a functional language designed for the development of front-end web
applications [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ], and sold to front-end developers as a way of avoiding the many software quality
issues which plague JavaScript programs. Its syntax, based on Haskell, is intentionally simple. For
example, it has no support for user-defined type classes. In addition to strictly enforcing types, the
Elm compiler also forces programmers to follow best practices, such as disallowing incomplete case
coverage in case expressions. Elm apps use a model-view-update pattern in which users write pure
functions and the run-time system handles side efects without the need for advanced concepts. Elm
code compiles to JavaScript, simplifying deployment and visualization.
        </p>
        <p>While many consider that functional programming should be reserved for expert users, many of the
features useful for experts (strict types, pure functions) are also very useful for beginners. In addition to
the practical implications of compiling to JavaScript, Elm’s combination of simple syntax, strict typing,
and purity which matches students’ pre-existing intuition about math proves to be an asset to our
program. These features allow the development of tools and curricula which would not otherwise be
easy or possible in an imperative language with side efects such as Python.</p>
        <p>In another conference presentation, we present parallel work on supporting non-English-speaking
learners in a tool called ShapeCreator. The use of function composition to structure code which is
favoured by functional programmers is evident in the graphical layout of ShapeCreator, and it matches
the way geometry is taught in the early grades.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Outreach program</title>
      <sec id="sec-3-1">
        <title>3.1. McMaster Start Coding Program</title>
        <p>
          Our Outreach Program has been operating for the past decade. A mainly volunteer group of
undergraduate and graduate students develop lesson plans and deliver free workshops to schools, public libraries,
and community centres in the Hamilton, Ontario, Canada area [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]. During the COVID-19 pandemic,
the program shifted online and has taught an increasing numbers of students. The goal of the program
is to foster interest and ability in STEM subjects through coding, especially for those groups who are
underrepresented in STEM subjects, such as girls and underprivileged youth.
        </p>
        <p>
          To support these workshops, we have developed several tools, including:
1. An open-source Elm graphics library [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ], GraphicSVG1.
2. An online mentorship and Elm compilation system incorporating massive collaborative
programming tasks, including the Wordathon2 and comic book storytelling3.
        </p>
        <p>
          3. A curriculum for introducing graphics programming designed to prepare children for algebra [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ].
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. WebIDE</title>
        <p>In our IDE, we deliberately do not store any personally identifiable information, so this dataset includes
children who programmed during one class visit, a series of class visits, one or more summer camps,
training sessions for potential undergraduate mentors and first-year undergraduate students.</p>
        <p>Using a video-game metaphor, children have access to diferent coding environments, originally
with 10 slots available for diferent modules. Once they pick a slot, they see an interface with four
quadrants, for their code, graphical output or compiler errors, help chat, and (optionally) additional
information about the activity. If additional information is not available for a slot, half the screen is
devoted to their code. Diferent slots hide some or all components in a working program. For example
in Picture slots, the main function is hidden and no interaction is possible, instead children must define
a myShapes top-level definition whose type must be a list of shapes. In Animation slots, myShapes
is a function with an input which is a record with a single field, the current time in seconds since the
program was “played”. Depending on the level of the class, they may only learn to use the Picture slot,
or they may advance to the Animation slot in their second session. A Wordathon is a special activty
in which children are assigned beginner reading words to code as pictures or animations, and their
code is combined together into a reading game. Sometimes classes compete against each other to win a
pizza party. Teachers like these activities because their students both feel like they are doing something
useful, and get to compete in a fun competition—and they love pizza! The Wordathon slot as a second
definition, myWord of type String, where they must specify the word they are creating, and which will
be used when their module is imported into the game module. It also has a semi-transparent background
masking out the border of the output which will be masked out in the game. This is especially useful if</p>
        <sec id="sec-3-2-1">
          <title>1https://package.elm-lang.org/packages/MacCASOutreach/graphicsvg/latest/GraphicSVG 2http://outreach.mcmaster.ca/#wordathon2019 3http://outreach.mcmaster.ca/#comics2019</title>
          <p>they want to animate an object sliding into view. Although there are many advanced slot types, Game
is the fourth commonly used slot type. It adds interactivity, which requires the definition of a message
(event) type, as well as a model type (which in previous slots was hard-coded to be a record containing
only the animation time). Only classes doing extended workshops used the Game slots. This allows us
to infer a lot about the experience and immediate goals of the users from the slot type they are using.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Method</title>
      <p>In order to be able to give mentors access to modules with pending help requests, all versions of all
modules are left on the file system of the server. Each module is stored using the childrens’
randomlyassigned IDs, timestamp, and slot number (which encodes the activity type in the most significant digits).
We extracted 254,708 compilation attempts from 5330 users, from 2019 until January 2022. Each of the
code fragments was recompiled together with the hidden boilerplate code to extract the success/error
and error code results, and the results were added to a database. Python scripts were used to calculate
time between first error of a user in a slot and the next successful compilation, and add this information
to the database. This data was then compared to known probability distributions and compared visually.</p>
      <p>Because errors include excerpts from code, this database was further reduced to the time, slot, user
id, and error type (not the whole error message) for errors. It was further cleaned to remove all user
ids corresponding to teachers, mentors and undergraduate students. This cleaned data contains 13012
rows, and is available for download as an Excel file together with the pivot tables used in the analysis.</p>
      <p>Next, the dataset with error types and timestamps was used to simulate a mentor dashboard in which
errors appear as they occur.</p>
      <p>Finally, we investigated the possibility of providing additional feedback by comparing compilation
behaviour with past observations. To this end we collected compilation results by user, resulting in a
high-dimensional dataset, to which we applied t-distributed stochastic neighbor embedding (t-SNE)
for dimensional reduction, followed by k-means clustering. This produced very interesting-looking
complex clusters, but there were too many clusters for us to develop meaningful interpretations of the
results, so we do not include them in this paper, but hope other researchers will have greater success in
extracting meaning from this approach.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Results</title>
      <p>We were able to derive concrete answers to our research questions, and develop a mentor dashboard
which could provide mentors an at-a-glance view of the time to resolve errors.</p>
      <sec id="sec-5-1">
        <title>5.1. Predicting the time to resolve an error</title>
        <p>
          To try to answer RQ2, our first attempt to predict when students might need help was to model the
time between an error occurring and the error getting resolved. Figure 1 shows that the time to resolve
a compiler error approximates a power law distribution [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]. Even though most of the errors in the
distribution took less than a minute to resolve, there are many errors which took considerable amount
of time to resolve. Knowing this distribution, we can predict the time it would take a student to resolve
an error and the chance they will resolve it on their own in a given amount of time. This distribution
was determined from pre-2019 compilation data, but we believe the pattern will hold in the present
dataset. If we assume that a power law is always a good approximation for time to correct an error, we
can re-estimate model parameters for subsets of users or subsets of sessions.
        </p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Error types</title>
        <p>with a significant number of compilations of advanced and/or experimental activities, which have been
excluded due to the dificulty of separating diferent use cases.</p>
        <p>To try to answer RQ1, we explored error statistics using diferent visualizations, and discovered
many patterns. Figure 3 shows that Unfinished List is the most prevalent error, but this is heavily
weighted to Picture slots. This can be explained by the fact that in early lessons, students must be
taught the syntax for lists, which fortunately follows English grammar in requiring commas between
elements, but unfortunately also requires matching square brackets around the list which students have
not encountered outside of a text-based language. Fortunately, we see that the number of such errors
declines dramatically as students advance from Picture to more advanced slots, and this is in spite of the
counfounding factor that Picture slots are sometimes used in more advanced projects for creating assets.</p>
        <p>Excluding Picture slots, Type Mismatch is actually the most common error, and this actually increases
for Game slots. This can be explained by the fact that before Game slots, children are only using numeric,
string, and list types, as well as Stencils and Shapes. Shapes are created by applying fill or outline
functions to Stencils, and forgetting to do this before applying geometric transformations would
result in a type error. But these type errors are quickly learned, whereas the use of user-defined types
in Game slots creates many more possibilities for type errors, and even students who understand the
mechanisms often make changes to their message or model types and let the compiler show them which
parts of the code need to be adapted, because this strategy works well in functional languages like Elm.</p>
        <p>Most of the remaining errors are easily understood, and follow variations of these patterns of
occurence, except for Naming Error, which is caused by misspelling a defined name, incluing type,
function and variable names. It is not unexpected that these errors rise in Game slots where user-defined
types, wider range of functions, and definitions (of characters or background elements) are often used.</p>
        <p>It turns out that the error types encountered are far from equally distributed, and being prepared for
this and help mentors prepare explanations for the most common errors. Moreover, the most common
errors change from slot type to slot type, and there is an easy-to-understand story we can give to
mentors in training to both prepare them both for their expected role in fixing errors, but also in
knowing that their students will rapidly improve as the progress through the initial lessons.</p>
      </sec>
      <sec id="sec-5-3">
        <title>5.3. Mentor dashboard</title>
        <p>
          Figure 4 outlines a prototype design of a mentor dashboard. This prototype was used so mentors could
see historical compilation data play out in the dashboard, so they could gauge the value of diferent
possible interventions. Although we ultimately did not find useful clusters, the interface was able to
display real-time clustering information (see [
          <xref ref-type="bibr" rid="ref13 ref14">13, 14</xref>
          ]), as well as simple error age (time since the last
successful compile) in the upper pane, and per-student aggregate compilation data, in the lower pane.
The aging information was considered the most important by mentors, and was integrated into the
production mentor dashboard. Clustering and student aggregate data were not integrated, but remain
an options for the future.
        </p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Implications of answers to research questions</title>
      <p>Knowing that syntax and type errors are so skewed made it easier to train mentors. If mentors are
trained to eficiently solve common errors, they will have more time to handle less common errors.
Furthermore, teaching students to avoid the most common errors upfront will reduce the frequency of
those errors. Most errors are associated with the Picture activity, and the most common and fourth-most
common errors are both related to the construction of lists. We know from teachers’ andecdotal reports
that children do not study English grammar as a separate subject, and most do not correctly punctuate
sentences. It is not surprising that they make many errors in constructing lists. In addition to teacher
training, we have developed instructional aids to help mentors prepare students for these challenges.
Figure 5 shows one such example. Children between ages 10 and 12 are most likely to make grammar
mistakes in English because they are the youngest students we regularly teach. For the same reason,
children most remember building things with blocks, so we developed a graphical analogy between the
structure of a list and the structure of a house made from blocks. Lacking a broad base, the house is
more likely to collapse, analogous to missing the opening bracket. Lacking roof, or having a peaked
roof in a middle layer will also cause problems, analogous to missing a closing bracket, or to pasting
additional shapes functions after the list has already been closed. Figure 6 shows a second analogy
to explain Type Mismatch errors. One of the advantages of strict typing in Elm is that functions can
only be composed in meaningful ways, just as blocks can only be connected if they are designed to fit
No Error
“UNFINISHED LIST” Error
Code example</p>
      <p>Code example
“UNEXPECTED COMMA” Error (Missing “[”)</p>
      <p>“UNEXPECTED COMMA” Error (Early “]”)
Code example</p>
      <p>Code example
together. Further study is needed to determine how well such analogies work, and why they work.</p>
      <p>As to the second research question. Unfortunately, the specific power law which fit the data indicates
that if an error is not solved in less than a minute, it is unlikely to be solved without some intervention.
Additional research to elucidate the “why” might lead to other interventions, but for now, our approach
is to prioritize individual assistance, and to identify students who have solved errors and are available to
help other students. We continue to experiment with dashboards for mentors, instructors and classroom
teachers. A dashboard with near real-time previews of student progress allows teachers to identify
students who are ready to help others diagnose compiler errors, etc. In Figure 7, we show one such
experiment, which combines previews of student work (in this case on a maze challenge) with some
No Error
“TYPE MISMATCH” Error
Code example
Code example</p>
    </sec>
    <sec id="sec-7">
      <title>7. Limitations</title>
      <p>The programs analyzed for this study were from English-speaking schools in Canada, learning a
programming langauge mostly used by web developers, not educators. The results may not generalize.</p>
      <p>Our outreach program continues to evolve, and the slot system has been replaced by a more flexible
activity system. Our student population is also more diverse, including children with autism and
children in enrichment programs. As a consequence, more recent data does not come segmented in the
way we relied upon for this analysis. The programs themselves have changed, and we make use of a
suite of educational technologies. Beginners can learn about both the basics and some advanced topics
outside the WebIDE, which will change the profile and timing of errors.</p>
      <p>Despite these factors, the most common errors are largely the same, and the methods developed
which are helpful for teaching remain helpful, although the optimal time to introduce them changes.</p>
    </sec>
    <sec id="sec-8">
      <title>8. Conclusions and future work</title>
      <p>We were able to successfully answer our two research questions: finding that resolution time follows
a power law, and hence most errors are resolved relatively quickly, making it easy to decide when to
intervene; and developing a story to explain the most common compiler errors and when students are
most likely to make them. This information has been incorporated into our curriculum and mentor
training.</p>
    </sec>
    <sec id="sec-9">
      <title>Author Contributions</title>
      <p>Conceptualization – Christopher Kumar Anand; methodology – Yiding Li; formulation of tasks analysis
– Kruthiga Karunakaran and Yiding Li; software – Kruthiga Karunakaran and Yiding Li; writing –
original draft – Kruthiga Karunakaran and Vaitheeka Nallasamy; analysis of results – Yiding Li and
Kruthiga Karunakaran; visualization – Christopher Kumar Anand and Stephanie Li; reviewing and
editing – Vaitheeka Nallasamy and Yiding Li. All authors have read and agreed to the published version
of the manuscript.</p>
    </sec>
    <sec id="sec-10">
      <title>Funding</title>
      <sec id="sec-10-1">
        <title>This study receives funding from the Faculty of Engineering.</title>
      </sec>
    </sec>
    <sec id="sec-11">
      <title>Data Availability Statement</title>
      <p>No new data were created or analysed during this study. Data sharing is not applicable.</p>
    </sec>
    <sec id="sec-12">
      <title>Conflicts of Interest</title>
      <sec id="sec-12-1">
        <title>The authors declare no conflict of interest.</title>
      </sec>
    </sec>
    <sec id="sec-13">
      <title>Acknowledgments</title>
      <p>We acknowledge financial support from the Faculty of Engineering. We also appreciate input from
teachers, parents, and all the enthusiasm and inspiration from all the future coders we visit.</p>
    </sec>
    <sec id="sec-14">
      <title>Declaration on Generative AI</title>
      <sec id="sec-14-1">
        <title>The authors have not employed any Generative AI tools.</title>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M. U.</given-names>
            <surname>Bers</surname>
          </string-name>
          ,
          <article-title>Coding as a literacy for the 21st century, Education Week (</article-title>
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Q.</given-names>
            <surname>Burke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W. I. O</given-names>
            <surname>'Byrne</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y. B.</given-names>
            <surname>Kafai</surname>
          </string-name>
          , Computational Participation:
          <article-title>Understanding Coding as an Extension of Literacy Instruction</article-title>
          ,
          <source>Journal of Adolescent &amp; Adult Literacy</source>
          <volume>59</volume>
          (
          <year>2016</year>
          )
          <fpage>371</fpage>
          -
          <lpage>375</lpage>
          . doi:
          <volume>10</volume>
          .1002/jaal.496.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>D.</given-names>
            <surname>Silver</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Saunders</surname>
          </string-name>
          , E. Zarate,
          <article-title>What factors predict high school graduation in the Los Angeles Unified School District</article-title>
          , volume
          <volume>14</volume>
          , California Dropout Research Project Santa Barbara, CA,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>C.</given-names>
            <surname>Romero</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ventura</surname>
          </string-name>
          ,
          <article-title>Educational data science in massive open online courses</article-title>
          ,
          <source>WIREs Data Mining and Knowledge Discovery</source>
          <volume>7</volume>
          (
          <year>2017</year>
          )
          <article-title>e1187</article-title>
          . doi:
          <volume>10</volume>
          .1002/widm.1187.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S.</given-names>
            <surname>Bull</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kay</surname>
          </string-name>
          , Open Learner Models, in: R.
          <string-name>
            <surname>Nkambou</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Bourdeau</surname>
          </string-name>
          , R. Mizoguchi (Eds.),
          <source>Advances in Intelligent Tutoring Systems</source>
          , Springer, Berlin, Heidelberg,
          <year>2010</year>
          , pp.
          <fpage>301</fpage>
          -
          <lpage>322</lpage>
          . doi:
          <volume>10</volume>
          .1007/ 978-3-
          <fpage>642</fpage>
          -14363-2_
          <fpage>15</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>W.</given-names>
            <surname>Matcha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. A.</given-names>
            <surname>Uzir</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Gašević</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Pardo</surname>
          </string-name>
          ,
          <article-title>A Systematic Review of Empirical Studies on Learning Analytics Dashboards: A Self-Regulated Learning Perspective</article-title>
          ,
          <source>IEEE Transactions on Learning Technologies</source>
          <volume>13</volume>
          (
          <year>2020</year>
          )
          <fpage>226</fpage>
          -
          <lpage>245</lpage>
          . doi:
          <volume>10</volume>
          .1109/TLT.
          <year>2019</year>
          .
          <volume>2916802</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>J. F.</given-names>
            <surname>Pane</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. A.</given-names>
            <surname>Grifin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. F.</given-names>
            <surname>McCafrey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Karam</surname>
          </string-name>
          ,
          <source>Efectiveness of Cognitive Tutor Algebra I at Scale, Educational Evaluation and Policy Analysis</source>
          <volume>36</volume>
          (
          <year>2014</year>
          )
          <fpage>127</fpage>
          -
          <lpage>144</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>J.</given-names>
            <surname>Clarke-Midura</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Poole</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Pantic</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hamilton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Allan</surname>
          </string-name>
          ,
          <article-title>How Near Peer Mentoring Afects Middle School Mentees</article-title>
          ,
          <source>in: Proceedings of the 49th ACM Technical Symposium on Computer Science Education, SIGCSE '18</source>
          ,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2018</year>
          , pp.
          <fpage>664</fpage>
          -
          <lpage>669</lpage>
          . doi:
          <volume>10</volume>
          .1145/3159450.3159525.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>S.</given-names>
            <surname>Krishnamurthi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Fisler</surname>
          </string-name>
          ,
          <article-title>Programming Paradigms and Beyond</article-title>
          , in: S. A.
          <string-name>
            <surname>Fincher</surname>
            ,
            <given-names>A. V.</given-names>
          </string-name>
          <string-name>
            <surname>Robins</surname>
          </string-name>
          (Eds.),
          <source>The Cambridge Handbook of Computing Education Research</source>
          , 1 ed., Cambridge University Press,
          <year>2019</year>
          , pp.
          <fpage>377</fpage>
          -
          <lpage>413</lpage>
          . doi:
          <volume>10</volume>
          .1017/9781108654555.014.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>E.</given-names>
            <surname>Czaplicki</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chong</surname>
          </string-name>
          ,
          <article-title>Asynchronous functional reactive programming for GUIs</article-title>
          ,
          <source>ACM SIGPLAN Notices</source>
          <volume>48</volume>
          (
          <year>2013</year>
          )
          <fpage>411</fpage>
          -
          <lpage>422</lpage>
          . doi:
          <volume>10</volume>
          .1145/2499370.2462161.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>C. d'Alves</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Bouman</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Schankula</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Hogg</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Noronha</surname>
            , E. Horsman,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Siddiqui</surname>
            ,
            <given-names>C. K.</given-names>
          </string-name>
          <string-name>
            <surname>Anand</surname>
          </string-name>
          , Using Elm to Introduce Algebraic Thinking to K-8
          <string-name>
            <surname>Students</surname>
          </string-name>
          ,
          <source>Electronic Proceedings in Theoretical Computer Science</source>
          <volume>270</volume>
          (
          <year>2018</year>
          )
          <fpage>18</fpage>
          -
          <lpage>36</lpage>
          . doi:
          <volume>10</volume>
          .4204/EPTCS.270.2, arXiv:
          <year>1805</year>
          .05125 [cs].
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>A.</given-names>
            <surname>Clauset</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. R.</given-names>
            <surname>Shalizi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. E. J.</given-names>
            <surname>Newman</surname>
          </string-name>
          ,
          <article-title>Power-Law Distributions in Empirical Data</article-title>
          ,
          <source>SIAM Review 51</source>
          (
          <year>2009</year>
          )
          <fpage>661</fpage>
          -
          <lpage>703</lpage>
          . doi:
          <volume>10</volume>
          .1137/070710111.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>J. MacQueen,</surname>
          </string-name>
          <article-title>Some methods for classification and analysis of multivariate observations</article-title>
          ,
          <source>in: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability</source>
          , Volume
          <volume>1</volume>
          : Statistics, volume
          <volume>5</volume>
          .1, University of California Press,
          <year>1967</year>
          , pp.
          <fpage>281</fpage>
          -
          <lpage>298</lpage>
          . URL: https://scispace. com/pdf/some
          <article-title>-methods-for-classification-and-analysis-of-multivariate-4pswti19oz</article-title>
          .pdf.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>A.</given-names>
            <surname>Bhaskara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. K.</given-names>
            <surname>Ruwanpathirana</surname>
          </string-name>
          ,
          <article-title>Robust Algorithms for Online k-means Clustering</article-title>
          ,
          <source>in: Proceedings of the 31st International Conference on Algorithmic Learning Theory, PMLR</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>148</fpage>
          -
          <lpage>173</lpage>
          . URL: https://proceedings.mlr.press/v117/bhaskara20a.html, iSSN:
          <fpage>2640</fpage>
          -
          <lpage>3498</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>