Developing a Computational Thinking Test using Bebras
                     problems

                                           James Lockwood
                                      Dept. of Computer Science
                           Maynooth University, Maynooth, Co. Kildare, Ireland
                                        james.lockwood@mu.ie
                                             Aidan Mooney
                                      Dept. of Computer Science
                           Maynooth University, Maynooth, Co. Kildare, Ireland
                                         aidan.mooney@mu.ie


                                                                1     Introduction
                                                                1.1    Computer Science to Go (CS2Go)
                       Abstract
                                                                Computer Science to Go (CS2Go) is a course de-
    Assessment is one of the major factors to con-              signed to teach Computer Science topics with a fo-
    sider when developing a new course or pro-                  cus on Computational Thinking. The idea to develop
    gram of study. When developing a course to                  a course arose from a need identified by our research
    teach Computer Science there are many forms                 group working with schools around Ireland, through
    this could take, one of which is linked to Com-             the PACT programme. We observed that teachers
    putational Thinking. Whilst developing Com-                 were keenly interested in delivering Computer Science
    puter Science to Go (CS2Go), an introductory                lessons and this led to more schools and teachers join-
    course aimed at secondary school students,                  ing the programme. It has been our intention from
    we have developed a Computational Think-                    the outset to expand the content on offer and to inves-
    ing test based on the problems developed for                tigate what other topics and methods could be used
    the international Bebras Challenge.                         [MDN+ 14].
                                                                   Due to the fact that there is little in the way of a
    This paper will describe the content and devel-             full course in Computational Thinking there was an
    opment of the course, as well as some analysis              opportunity and a desire to create a more complete
    on results from a year-long study with sec-                 and intensive course for Transition Year, with a view
    ondary school students and first-year under-                to developing it into a Junior Certificate short course.
    graduate students. We believe that, based on                In Ireland the second level school system consists of
    our analysis and previous research in the field,            an optional Transition Year (fourth year) which is one-
    that our assessment, based on pre-exisiting                 year in length and is taken after the Junior Cycle (first
    Bebras problems, has the potential to offer                 to third year) and before the two-year Leaving Certifi-
    educators another way of testing this increas-              cate programme, culminating in a final state exam.
    ingly discussed skill, Computational Think-                 In September 2016, teachers who had previously been
    ing.                                                        involved with our group, as well as others including
                                                                trainee teachers, were asked for their ideas and inputs
Copyright c by the paper’s authors. Copying permitted for       on course design and content. This feedback, in con-
private and academic purposes.                                  junction with input from our group members and an
In: A. Piotrkowicz, R. Dent-Spargo, S. Dennerlein, I. Koren,    extensive literature review, led to the setting out of
P. Antoniou, P. Bailey, T. Treasure-Jones, I. Fronza, C. Pahl
(eds.): Joint Proceedings of the CC-TEL 2018 and TACKLE
                                                                the following aims for a course which are presented in
2018 Workshops, co-located with 13th European Conference        no particular order:
on Technology Enhanced Learning (EC-TEL 2018), 03-09-2018,
published at http://ceur-ws.org                                     • Introduce students to Computer Science, what it
      is, how it can affect their lives, how they can be   1.3   Computational Thinking
      involved.
                                                           Denning [Den09] suggested that Computational
  • Improve students CT and problem-solving skills
                                                           Thinking (CT) has been around since the 1950s as
    by making them aware of a problem-solving pro-
                                                           algorithmic thinking, referring to the use of an or-
    cess and how it can be beneficial in many subjects
                                                           dered precise set of steps to solve a problem and where
    and areas of life.
                                                           appropriate to use a computer to do this task. Sey-
  • Improve students understanding of Computer Sci-        mour Papert [Pap80] is credited as concretising CT in
    ence including an imbalance in participation rates     1980 but it is since the contribution of Jeanette Wing
    across genders and a stereotyped view of who en-       [Win06], who popularised the term and brought it to
    gages in Computer Science.                             the international communitys attention, that more and
                                                           more focus has been placed on CT within education.
  • Teach students Computer Science concepts such          In her seminal paper, Wing outlined how she believed
    as Algorithms, Cryptography, Sorting/Searching         that all children should be taught CT placing it along-
    Algorithms etc. with a focus not just on the con-      side reading, writing and arithmetic in terms of im-
    cepts themselves but on real-world applications.       portance. She further described it as representing a
                                                           “universally applicable attitude and skill set everyone,
  • Teach students programming to some level.              not just computer scientists, would be eager to learn
                                                           and use” [Win06].
Students who have participated in PACT courses in
                                                              Although academics have failed to agree on a uni-
the past have commented that the modules had been
                                                           versal definition of CT, Wing defines it as solving prob-
both enjoyable and a good way to develop program-
                                                           lems, designing systems, and understanding human
ming and other skills such as team work. However,
                                                           behaviour, by drawing on the concepts fundamental
they also stated a desire for more practical applica-
                                                           to computer science. She states that it is not pro-
tions and we have been working to ensure that the
                                                           gramming and that it means “more than being able to
topics and methods used in this course reflect their
                                                           program a computer. It requires thinking at multiple
feedback ([MDN+ 14]). The new course has since been
                                                           levels of abstraction” [Win06]. In 2008 Wing posed a
designed and tested and has been well received by both
                                                           question to the computer science, learning sciences and
students and teachers [LM18b].
                                                           education communities: “What are effective ways of
                                                           learning (teaching) CT by (to) children? [Win08]. This
1.2    Goals of the Test                                   in turn raised further questions about what concepts
Assessment is one of the key factors when designing        to teach, the order in which these might be taught,
and developing courses for any level of education. One     and which tools should be used to teach them.
of the areas that was needed to analyse the success           In the meantime, a lot of work has been done
and impact of CS2Go was to find or develop a Com-          around the world and across all levels of education
putational Thinking test. It had to fit the following      to introduce CT into schools, colleges, after school
requirements:                                              clubs, mainly through Computer Science or comput-
                                                           ing classes/courses. As CT is important to a com-
  • Be applicable to the target age range (15-17 years     puter scientist this makes sense; however, it should be
    old).                                                  noted that being able to think computationally, which
                                                           includes skills such as decomposition, abstraction, al-
  • Allow for differentiation between strong and           gorithmic thinking and pattern matching, can be of
    weaker students (i.e. have harder and easier ques-     benefit to all disciplines. [Bun07] has made this point
    tions).                                                stating that CT concepts have been used in other dis-
                                                           ciplines and that the ability to think computationally
  • Allow students to complete the questions without       is essential to every discipline.
    any prior knowledge.
                                                              A wide array of topics has been used to intro-
  • Be completed within a 40-minute class time.            duce CT to students. In addition to explicitly teach-
                                                           ing students what CT is [GCP14, LHW16] students
  • Allow for a pre- and post-test of similar difficulty   may be introduced to concepts such as abstraction
    and content.                                           [AD16, SS15], modelling [CN13], Algorithms [AD16,
                                                           FLM+ 15, MDN+ 14], decomposition [AD16] and prob-
  • Test students Computational Thinking skills.           lem solving/critical thinking skills [RFP14, SS15].
1.4   CT Assessment                                       Cryptography, Trees etc. and this allows them to be
                                                          used to introduce students to these topics without stu-
Assessment of CT is in it’s infancy and as such, there
                                                          dents even realising they are learning them.
aren’t many methods for educators to test what is be-
                                                             The fact that the Bebras problems are designed to
ing described more as a central skill for students to
                                                          test Computational Thinking skills means they are
possess.
                                                          well suited to test students Computational Think-
   Of note is one effort to develop a Computational
                                                          ing skills before and after the course. Gouws et al.
Thinking test called the Computational Thinking Test
                                                          [GBW13] previously used the South African version
(CTt) and another project called Dr. Scratch. Dr
                                                          of Bebras in a similar manner and it was this that
Scratch anaylses Scratch projects to deliver a CT score
                                                          inspired the development of our own Computational
based on a number of different metrics [MLRG15].
                                                          Thinking test. Other studies have also been carried
This is a great tool and we recommend it as a tool
                                                          out on the Bebras problems to investigate both their
to analyse Scratch projects developed in one module
                                                          effectiveness and to compare them to other Computa-
of CS2Go. As it works exclusively with Scratch, this
                                                          tional Thinking tests [HM15, Van14, DS16, HM14].
didn’t suit our purposes to study students “general”
CT skills pre- and post-course. The CTt test has been
developed as a series of multiple-choice questions that   2    Methodology
are presented online in either a “maze” or “canvas”       The current format of the Bebras challenge doesn’t
interface. There a number of factors which define the     suit as a comparative test as the questions change each
questions [Gon15]. The group have analysed these two      year. The challenge is often conducted on PC’s and we
metrics (CTt and Dr Scratch) alongside the Bebras         wanted to allow teachers to do it through either pen
problems [RGMLR17]. They found that CTt was par-          and paper or online if desired. It was decided that 13
tially convergent with the other two and claim this is    questions would be used in each test, with students
to be expected as the three assess CT but from differ-    allowed 35 minutes to complete them. This considers
ent perspectives. They claim that CTt has a strength      both the 3-minute design of the question as well as
that it can be done in “pure pre-test conditions”. This   the fact that some of the questions are designed for
can allow early detection of problems but also doesn’t    a younger age group than the target demographic. It
allow for contextualised assessment. This is a strength   was hoped that each test would be as close as possible
of the Bebras problems, which has “real-life” questions   to each other in terms of difficulty level as well as ques-
but they also claim the “psychometric properties of       tion topic and type. To do this many questions from
some of the problems are still far off being demon-       Bebras challenges across the world were examined and
strated”.                                                 critiqued.
   With this being said, we felt that, from assessing         The questions used in the UK challenges were
various forms of assessment for Computational Think-      deemed most appropriate and the contents of the test
ing that exist, both through a systematic literature      were sources from the 2015 and 2016 challenges. For
review [LM18a] and through interactions with other        the target age group (15-17-year olds) the UK chal-
researchers and educators it was decided to develop       lenge involves 18 multiple-choice questions over 40
a test based on the Bebras competition problem for        minutes. As explained previously this was adjusted
CS2Go.                                                    slightly for our purposes to be shorter but also allowed
                                                          for some non-multiple-choice questions as well. The
1.5   Bebras Problems                                     first criteria for the tests was to ensure that they were
Bebras is an international competition which aims to      as close in terms of difficulty level as possible. The
promote Computer Science and computational think-         UK Bebras challenge is broken into six age groups as
ing among school students at all ages. Participants       presented in Table 1.
are usually supervised by teachers and the challenge
is performed at schools using computers or mobile de-                  Table 1: Bebras UK Sections
vices.
   As part of their work in schools the PACT group            Group Name        Year Group       Approx. age
are involved in the Irish version of this test and have       Kits              2 &3             6-8
designed and used Bebras problems in order to pro-            Castors           4&5              8-10
vide teachers with resources to introduce students to         Juniors           6&7              10-12
Computational Thinking. They are designed to be 3-            Intermediate      8&9              12-14
minute-long questions and require no prior knowledge          Seniors           10 & 11          14-16
of programming or Computer Science topics. All the            Elites            12 & 13          16-18
problems are linked to topics in Computing such as
   Each age group is then further divided into three
                                                                        Table 3: Topics of the questions
Sections, namely, Section A, Section B and Section C.
Questions in Section A are considered the easiest with
                                                             Test 1         Topic        Test 2       Topic
Section C problems being the more complex. Ques-
tions that are submitted for the Bebras problem are          Bracelet       Pattern      Bebras       Algorithms
reviewed by a panel of experts in Computing education                       Matching     Painting
who are involved in the Bebras challenge. Questions          Animation      Attributes   Bottles      Sorting
that are accepted for either the qualification rounds,                      and Vari-
or the final challenge are often used in multiple age                       ables
groups and across the three Sections.                        Animal         Data or-     Party        Graphs
                                                             Competi-       dering       guests
   To ensure that each created test was as similar in
                                                             tion
difficulty as possible these ratings were used to select
                                                             Stack          Stacks       Pirate       Graphs
questions for each test, ensuring that corresponding
                                                             computer                    Hunters
problems were used in at least one common section and
                                                             Throw          If-then-     Magic po-    Logic &
age group. The chosen corresponding problems for the
                                                             the dice       else         tion         binary
tests along with the sections they have in common can
                                                             Drawing        Objects      Concurrent   Parallel
be seen in Table 2. For the complete set of problems
                                                             stars                       directions   instruc-
consult goo.gl/XDRHbq.
                                                                                                      tions
                                                             Beaver         Trees        Theatre      Sequences
       Table 2: Matching sections of the tests               lunch
                                                             You won’t      Ciphering    Secret       Ciphering
                                                             find it                     messages
 Test 1 Ques-      Common Sections       Test 2 Ques-
                                                             Bowl Fac-      Sorting      Triangles    Iterative,
 tion                                    tion
                                                             tory                                     pattern
 Bracelet          Kits B, Castors A     Bebras Paint-
                                                                                                      matching
                                         ing
                                                             Fireworks      Encoding     Scanner      Pixels
 Animation         Castors B, Juniors    Bottles
                                                                                         code
                   A
                                                             Spies          Gossip       B-enigma     Encrypting
 Animal Com-       Castors B, Inter-     Party guests
                                                                            Problem
 petition          mediate A
 Cross Country     Intermediate A        Tube System
 Stack    com-     Senior B, Elite A     Pirate            tested by a small group to ensure that the questions
 puter                                   Hunters           were clear, made sense, that our timing (35 minutes)
 Throw      the    Juniors C             Magic potion      was reasonable, and that both sets of questions ap-
 dice                                                      peared similar in terms of difficulty. The group found
 Drawing stars     Intermediate B        Concurrent        that the second test was perhaps slightly harder, but
                                         directions        that for 35 minutes it was doable and that the ques-
 Beaver lunch      Senior B              Theatre           tions were clear in general.
 You won’t find    Intermediate     C,   Secret     mes-       To further assess that the two tests are similar in
 it                Elite A               sages             difficulty and validate their effectiveness the questions
 Bowl Factory      Intermediate     C,   Triangles         were sent out to teachers, undergraduate and post-
                   Elite B                                 graduate students and third level academic staff with
 Fireworks         Senior C              Scanner code      instructions of how to rate the questions difficulty. The
 Kangaroo          Elite C               The Game          hope was that this sample of different demographic
 Spies             Elite C               B-enigma          and career groups would show not only that the two
                                                           tests are similar in difficulty but allow us to weigh ei-
                                                           ther specific questions or one of the tests accordingly
   The second criteria for the tests was to have similar   if there was a discrepancy. Tables 4, 5 and 6 presents
topics and styles for the questions where possible, and    the qualifications and areas of work of the participants.
to have these topics relating to areas covered in the      There was a mixture of genders and ages but this data
course. This was not as much a priority as the diffi-      was not collected, this group will be referred to as the
culty, so questions were considered even if this wasnt     panel from now on.
possible. Table 3 presents the topics covered by each          We asked the panel to rank the questions for us on
question for each test.                                    two scales. Twenty people completed this task for Test
   Prior to either of the tests being used, they were      1, with 18 of those also completing it for Test 2. The
                                                          questions are presented from easiest to hardest based
       Table 4: Qualification profile of the panel
                                                          on the average scores in the table.
       Highest Qualification   No. of participants
                                                              Table 7: Rating of Test 1 Questions out of 20
               PhD                      5
             Masters                    1
                                                            Rank         Question           Average out of 20
         Bachelors Degree              10
                                                              1           Bracelet                 3.2
        Leaving Certificate            3
                                                              2          Animation                 5.6
           Unspecified                  1
                                                              3        Cross Country              6.15
                                                              4        Beaver Lunch                7.6
                                                              5        Drawing Stars               7.6
            Table 5: Job profile of the panel                 6       Throw the Dice               7.7
                                                              7      You Wont Find It             8.75
            Job Title             No. of participants         8     Animal Competition            8.95
             Lecturer                     5                   9          Kangaroo                 9.05
      Primary school teacher              2                  10          Fireworks                9.45
     Secondary school teacher             1                  11        Bowl Factory               12.6
    Tutor/Postgraduate Student            5                  12       Stack Computer              13.5
          Youth worker                    2                  13            Spies                  15.1
      Nurse/Veterinary Nurse              2                               Average                 8.87
      Undergraduate Student               2
           Unspecified                    1
                                                             There isn’t much of a difference between this rank-
                                                          ing and the ranking participants gave the questions out
                                                          of 13. This is to be expected, and the two rankings are
           Table 6: Area of work of the panel             presented in Table 8 to show this comparison.

           Area of work        No. of participants                     Table 8: Test 1 Comparison
        Computer Science               9
               Irish                   1
           Mathematics                 1                      Question 1-13        Rank       Question 1-20
      Electronic Engineering           1                         Bracelet            1           Bracelet
           Youth work                  2                        Animation            2          Animation
             Medicine                  2                      Cross Country          3        Cross Country
             Teaching                  3                     Throw the Dice          4        Beaver Lunch
            Unspecified                1                      Drawing Stars          5        Drawing Stars
                                                              Beaver Lunch           6       Throw the Dice
                                                            You Wont Find It         7      You Wont Find It
first scale was rating the questions in each tests from         Kangaroo             8     Animal Competition
easiest to hardest, this gave each question a ranking      Animal Competition        9          Kangaroo
from 1 to 13. To further enhance this ranking a second          Fireworks           10          Fireworks
scale was needed, as some questions might be classified      Stack Computer         11        Bowl Factory
as being the easiest two, but there could be a big gap        Bowl Factory          12       Stack Computer
in difficulty between them. The same could be true of             Spies             13            Spies
any two questions. Since each test had 13 questions it
was decided that a scale from 1-10 wouldn’t allow the
panel to be clear and would in fact limit the ranking.       It should be noted that Beaver Lunch, Drawing
A scale of 1-20 was decided upon, with 1 being easiest    Stars and Throw the Dice were rated as having al-
and 20 was hardest. The panel weren’t given further       most exactly the same level of difficulty, with scores
instruction unless it was requested, they were free to    of 7.6, 7.6 and 7.7 respectively (see Table 7) out of
rank the questions as they saw fit.                       20. It should also be noted that there is a large jump
                                                          in difficulty from 10th to 11th position (Fireworks to
                                                          Bowl Factory). Kangaroo and Fireworks are rated
3     Results
                                                          9.05 and 9.45 respectively, but the scores then jump
When asked to rate the questions on a scale from 1-       significantly to 12.6, 13.5 and 15.1 for Bowl Factory,
20, Table 7 presents the scores for each question. The    Stack Computer and Spies. A similar gap can be seen
when going from the first three questions to the 4th
question. Bracelet, Animation and Cross Country are
rated as 3.2, 5.6 and 6.15 respectively, which in of it-
self covers a broad range. The score then jumps up to
7.6 for Beaver Lunch and Drawing Stars.                            Table 9: Test 1 Extensive Ranking
   This lines up roughly with the age categories ques-
tions were used in during the Bebras competition. Ta-
ble 9 presents a comparison between these three or-               Col 1       Col 2       Col 3
derings. For the original category and UK results we         Rank Ranking     Scores      Bebras Category
have used the percentage in the highest category they             from 1-13   from 1-20   (Highest)
were entered in, which can be seen in the table.             1    Bracelet    Bracelet    Bracelet (Inter A)
   If we use the rankings in each of these columns we             (1.75)      (3.2)
can attempt to rank the questions across all three           2    Animation   Animation   Animation (Inter
columns to give an overall ranking. For example,                  (4.8)       (5.6)       A)
Bracelet was ranked 1 in Column 1 and 1 in Column            3    Cross       Cross       Animal Competi-
2, giving a score of 2 (if we simply add these numbers            Country     Country     tion (Inter A)
together). If scores are identical in any of the columns          (4.95)      (6.15)
then they will be given the same score e.g. in Column        4    Throw       Beaver      Cross     Country
2 Beaver Lunch and Drawing stars have a score of 7.6              the Dice    Lunch       (Inter A)
so theyll both be given a value of 5 (i.e. the highest            (5.7)       (7.6)
ranked question of the two).                                 5    Drawing     Drawing     Beavers    Lunch
   In Column 3 a score will be given relating to the po-          Stars       Stars       (Senior B)
sition of the highest question; for example, You Won’t            (6.3)       (7.6)
Find It, Stack Computer and Drawing Stars were all           6    Beaver      Throw       Throw the Dice
used in Elite A Category, so they will all be given a             Lunch       the Dice    (Senior B)
value of 10 as Drawing Stars is the highest placed in             (6.65)      (7.7)
the list.                                                    7    You         You         Fireworks (Senior
   Doing this for each question we can then rank them             Wont        Wont        C)
from 1 to 13, with 1 being the easiest question (the              Find It     Find It
lowest total across all four columns) and 13 being                (7.15)      (8.75)
the hardest (the largest total across all four columns).     8    Kangaroo    Animal      You Wont Find It
This ranking is shown in Table 10.                                (7.45)      Com-        (Elite A)
   The rankings shown in Table 10 can allow us to                             petition
weigh questions by awarding a higher mark for getting                         (8.95)
a correct answer on harder questions and lower marks         9    Animal      Kangaroo    Stack Computer
for a correct answer on easier questions. This hasn’t             Com-        (9.05)      (Elite A)
been deemed necessary at this stage but further results           petition
might lead us to do this especially with hard questions           (7.6)
such as Spies.                                               10   Fireworks   Fireworks   Drawing     Stars
   What we can deduce from this rankings is that the              (7.95)      (9.45)      (Elite A)
questions can be split into 3 different difficulty levels.   11   Stack       Bowl        Bowl      Factory
Questions in Rank 1-4 all have a score of less than 15,           Com-        Factory     (Elite B)
they can be seen as the easiest four questions. Ques-             puter       (12.6)
tions 5-10 have a ranking of between 15-30 and can                (9.6)
be seen as intermediate questions and Questions 11-13        12   Bowl        Stack       Kangaroo     (Elite
have rankings of over 30, and they can be seen as the             Factory     Com-        C)
hardest questions.                                                (9.85)      puter
                                                                              (13.5)
3.1   Test 2                                                 13   Spies       Spies       Spies (Elite C)/
                                                                  (11.25)     (15.1)
When asked to rate the questions on a scale from 1-
20, Table 11 presents the scores for each question as
rated by the testers. The questions are presented from
easiest to hardest. Unlike in Test 1 there appears to
be more of a difference between this ranking and the
               Table 10: Test 1 Ranking                                   Table 12: Test 2 Comparison

    Rank Question                 Total    Breakdown*           Question 1-13          Rank    Question 1-20
                                  Score                         Bebras Painting         1      Bebras Painting
    1       Bracelet              6        1+1+4                Tube System             2      Tube System
    2       Animation             8        2+2+4                Magic Potion            3      Concurrent Direc-
    3       Cross Country         10       3+3+4                                               tions
    4       Throw the Dice        14       4+6+6                Party Guest              4     Magic Potion
    5       Beaver Lunch          17       6+5+6                Bottles                  5     Theatre
    6       Drawing Stars         20       5+5+10               Theatre                  6     Bottles
    7       Animal Compe-         21       9+8+4                Concurrent Direc-        7     Party Guest
            tition                                              tions
    8       You Won’t Find        24       7+7+10               Secret Messages          8     Secret Messages
            It                                                  Scanner Code             9     Triangles
    9       Fireworks             27       10+10+7              B-Enigma                10     Scanner Code
    10      Kangaroo              30       8+9+13               Triangles               11     B-Enigma
    11      Stack Computer        33       11+12+10             The Game                12     The Game
            Bowl Factory          33       12+11+10             Pirate Hunters          13     Pirate Hunters
    13      Spies                 39       13+13+13

*Rank of (col1 + col2 + col3 from ) from Table 9             position. This is what leads to the slight mismatch in
                                                             those positions. Similarly, there is only one point sepa-
ratings given from 1-13, as shown in Table 12.               rating Triangles (10.89) in ninth place with The Game
                                                             (11.78) in 12th place. Also, of interest is the fact that
                                                             Bottles and Party Guest (both 8.44) and The Game
   Table 11: Rating of Test 2 Questions out of 20
                                                             and B-Enigma (both 11.78) were rated with the same
                                                             level of difficulty.
 Rank           Question               Average out of 20         These rankings line up roughly with the age cate-
   1        Bebras Painting                  3.72            gories questions were used in during the Bebras compe-
   2          Tube System                    5.94            tition. Table 13 presents a comparison between these
   3      Concurrent Directions              7.72            three orderings. For the original category and UK re-
   4         Magic Potion                     8              sults we have used the percentage in the highest cat-
   5            Theatre                      8.06            egory they were entered in, which can be seen in the
   6             Bottles                     8.44            table.
   7          Party Guest                    8.44                If we use the rankings in each of these columns we
   8        Secret Messages                  8.78            can rank the questions across all three columns to give
   9            Triangles                   10.89            an overall ranking. For example, Bebras Painting was
  10         Scanner Code                   11.56            ranking 1 in Column 1 and 1 in Column 2, giving a
  11           B-Enigma                     11.78            score of 2, when added together. If scores are identical
  12           The Game                     11.78            in any of the columns then they will be given the same
  13         Pirate Hunters                 13.44            score e.g. in column two B-engima and The Game
                Average                      9.12            have the same score, so theyll both be given a value of
                                                             12 (i.e. the highest ranked question of the two).
                                                                 In Column 3 the score will be given of the highest
   It is interesting to note that with this test it ap-      question, for example Theatre, Scanner Code and Tri-
pears there are a few more discrepancies between the         angles were all used in Elite B, so they will all be given
two rankings. From questions 3-11 there are several          a value of 11 as Triangles is the highest placed in the
questions “out of place”, some just one rank (like The-      list.
atre and Bottles in ranks five and six) or in the case           Doing this for each question we can then rank them
of Concurrent Directions and Party Guest, differ by          from 1 to 13, with 1 being the easiest question (the
three or four ranks. The reason for this is that these       lowest total across all four columns) and 13 being
questions were all rated very similarly by most people.      the hardest (the largest total across all four columns).
In terms of the difficulty score (from 1-20) there is only   This ranking is shown in Table 14.
one point separating Concurrent Directions (7.72) in             As with Test 1 the rankings shown in Table 10 can
third position, and Secret Messages (8.78) in eighth         allow us to weigh questions by awarding a higher mark
                                                                  Table 14: Test 2 Ranking

                                                       Rank Question                 Total    Breakdown*
                                                                                     Score
                                                       1       Bebras Painting       3        1+1+1
      Table 13: Test 2 Extensive Ranking               2       Tube System           8        2+2+4
                                                       3       Bottles               14       5+7+2
     Column 1      Column 2       Column 3             4       Party Guest           15       4+7+4
Rank Our           Our Analysis   Bebras Cat-                  Magic Potion          15       3+4+8
     Analysis      (scores from   egory (High-                 Concurrent            15       7+3+5
     (Ranking      1-20)          est)                         Directions
     from 1-13)                                        7       Theatre               22       6+5+11
1    Bebras        Bebras         Bebras               8       Secret Messages       24       8+8+8
     Painting      Painting       Painting             9       Scanner Code          30       9+10+11
     (2.22)        (3.72)         (Castors A)          10      Triangles             31       11+9+11
2    Tube Sys-     Tube System    Bottles (Ju-         11      Pirate Hunters        34       13+13+8
     tem (4.33)    (5.94)         nior A)              12      B-enigma              35       10+12+13
3    Magic Po-     Concurrent     Party Guests         13      The Game              37       12+12+13
     tion (5.39)   Directions     (Inter A)
                   (7.72)                          *Rank of (col1 + col2 + col3) from Table 13
4     Party        Magic    Po-   Tube System      for getting a correct answer on harder questions and
      Guest        tion (8)       (Inter A)        lower marks for correct answer on easier questions.
      (5.78)                                       This hasn’t been deemed necessary at this stage but
5     Bottles      Theatre        Concurrent       further results might lead us to do this with especially
      (6.22)       (8.06)         Directions       hard questions such as The Game.
                                  (Senior A)          Similarly to Test 1 we can deduce from this rankings
6     Theatre      Bottles        Pirate           that the questions can be split into 3 different difficulty
      (6.5)        (8.44)         Hunters          levels. Questions in Rank 1-3 all have a score of less
                                  (Elite A)        than 15, they can be seen as the easiest four questions.
7     Concurrent   Party Guest    Magic     Po-    Questions 4-9 have a ranking between 15-30 and can
      Directions   (8.44)         tion    (Elite   be seen as intermediate questions and Questions 10-
      (6.78)                      A)               13 have rankings of over 30, they can be seen as the
8     Secret       Secret Mes-    Secret Mes-      hardest questions. These divisions are similar to the
      Messages     sages (8.78)   sages (Elite     divisions shown in Test 1.
      (7)                         A)
9     Scanner      Triangles      Theatre
      Code         (10.89)        (Elite B)        4    Findings
      (8.89)                                       Using the average of the difficulty of the two tests as
10    B-enigma     Scanner        Scanner          presented in Table 7 and Table 11 it can be seen that
      (8.94)       Code (11.56)   Code (Elite      both tests are of a similar difficulty level. Test 1 ques-
                                  B)               tions were rated on average at a perceived difficulty of
11    Triangles    B-enigma       Triangles        8.87 and Test 2 questions were rated on average at a
      (9)          (11.78)        (Elite B)        perceived difficulty of 9.12. This leads us to be able
12    The Game     The Game       The Game         to conclude that the two tests have a similar difficulty
      (9.89)       (11.78)        (Elite C)        rating.
13    Pirate       Pirate         B-enigma            These tests have been run over the course of the
      Hunters      Hunters        (Elite C)        2017-18 academic year in a number of schools as well
      (9.94)       (13.44)                         as on a first year undergraduate CS course. It was run
                                                   in schools as part of the wider CS2Go roll-out and it
                                                   was decided that running it with the undergraduate
                                                   students would be helpful as they are a larger, more
                                                   consistent sample. It was also felt that students could
                                                   benefit from the problem solving aspect of the assess-
                                                   ment.
   With both cohorts it was hoped to see if the test         also found across the whole population with the aver-
could be completed in the 35 minute time-period al-          ages being 7.689 for Test 1 (n=263) and 7.933 for Test
lotted. It was also hoped that it could be seen from re-     2 (n=180) (T-score = 1.17, P-value = 0.24).
sults that the test targets the students Computational
Thinking skills. This is hard to really define but we        Table 16: Undergraduate results of the tests, where n
used students previous mathematics and programming           is the number of students taking the test
experience as metrics to compare groups. Mathemati-
cal ability has been shown to be a predictor of success
in programming [QBM15] and most would agree that                       Average of those who     Average of those
programming is a specific way of testing CT skills.                    took at least one test   who took both tests
                                                              Test 1   7.689 (n = 263)          7.988 (n = 174)
4.1     Overall Results                                       Test 2   7.933 (n=180)            8.03 (n = 174)
4.1.1     School data                                        These scores are all out of 13
A total of 200 took at least one of the problem-solving         As stated in Section 1.2, one of the hopes of these
tests. Of those 200 students, 187 took Test 1 and            studies was to show that the Bebras problems chal-
76 took Test 2. The decrease in number is due to             lenge students Computational Thinking skills. For
some schools not completing all of the feedback and          this we have looked at students who had previous pro-
assessment at the end of the year. This could have           gramming experience and those who took Higher Level
been due to their teachers not using the content much        Mathematics at the Leaving Certificate.
or not having time to re-test the students.                     Table 17 shows that students who took Higher Level
   Table 15 shows the results of the tests grouped by        maths performed significantly better in both Test 1 (T-
those to two at least one test and those who took both       score = 2.768 P-value=0.006) and Test 2 (T-score =
tests. It can be seen that students in both groups per-      3.409 P-value = 0.001). This is encouraging as mathe-
formed slightly better in the second test than the first     matical ability and Computational Thinking can be
test. For the whole population this is to a significant      seen as closely related skill-sets. Also interestingly
level (T-score = 2.473, P-value = 0.014) but for those       those who studied Ordinary Level Mathematics per-
who took both tests there is no significant difference       formed slightly worse in Test 2 than Test 1, whereas
(T-score = 0.159, P-value = 0.873).                          those who studied Higher Level increased slightly. It
                                                             should be noted that neither groups scores changed
Table 15: School results of the tests, where n is the        significantly over the two tests.
number of students taking the test                              It can also be seen that those who had previous pro-
                                                             gramming experience performed better in Test 1 than
                                                             those who had no experience, but not to a significant
           Average of those who     Average of those         level (T-score = 0.853 P-value = 0.395). Interestingly,
           took at least one test   who took both tests      not only was the numerical gap closed in this demo-
 Test 1    5.806 (n = 187)          6.527 (n = 55)           graphic by Test 2, but had swung the other way, with
 Test 2    6.627 (n = 76)           6.6 (n = 55)             those who had no previous experience out-performing
                                                             their peers, although neither groups scores changed to
These scores are all out of 13                               a significant level. This is encouraging as it could help
                                                             to show that the content covered by introductory CS
4.1.2     Undergraduate Students                             courses, namely programming and low-level theory, are
                                                             beneficial for Computational Thinking skills.
A total of 292 students took at least one of the                One way which we can compare the two difference
problem-solving tests. Of those 292 students, 263 took       cohorts is by looking at the percentage of students who
Test 1 and 180 took Test 2. The decrease in numbers          got each question right. We would expect the under-
is due to students changing course, only needing to          graduate students to perform better in most, if not all,
complete one semester of CS and other unrelated cir-         questions across both Tests.
cumstances.
   Table 16 shows the results of the tests grouped by
                                                             4.2   Bebras Test 1
those to two at least one test and those who took both
tests. It can be seen that a total of 174 took both tests.   It can be seen from Table 18 that in Test 1 the
Students performed marginally better in Test 2 com-          undergraduate students performed better than
pared to Test 1, but this isn’t a significant difference     those in secondary school in almost every question.
(T-score = 0.129, P-value = 0.897). This increase is         Many of the ranges in scores are from 15-30%
                                                         hardest questions across both tests.
Table 17: Undergraduate Demographic comparisons
                                                                      Table 19: Our Results Test 2
 Demographic           Test 1 Avg Test 2 Avg
 Studied OL (n = 44)   7.386       7.273                      Question                  Undergrad    School
 Studied HL (n = 110)  8.445       8.509                                                (n=197)      (n=75)
 PP (n = 71)           8.225       8.085                      Bebras Painting           81.7%        58.7%
 NPP (n = 88)          7.92        8.102                      Bottles                   90.4%        84%
        OL = Ordinary Level Mathematics                       Party Guests              88.3%        70.7%
         HL = Higher Level Mathematics                        Tube System               67.5%        57.3%
     PP = Previous Programming Experience                     Pirate Hunters            42.1%        33.3%
   NPP = No previous programming experience                   Magic Potion              83.2%        69.3%
                                                              Concurrent Directions     78.7%        66.7%
                                                              Theatre                   32.5%        21.3%
            Table 18: Our Results Test 1                      Secret Messages           86.8%        86.7%
                                                              Triangles                 47.2%        34.7%
      Question               Undergrads Schools               Scanner Code              36.5%        32%
                             (n=277)    (n=186)               The Game                  3.6%         5.3%
      Bracelet               94.6%      94.1%                 B-Enigma                  51.3%        42.7%
      Animation              70.8%      56.5%
      Animal Competition     64.9%      39.8%
      Cross Country          68.6%      45.2%
      Stack Computer         44%        13.9%            5   Conclusions
      Throw the Dice         77.9%      50.5%            Based on the analysis from our panel we can conclude
      Drawing stars          82.3%      60.2%            that the two tests are of approximately equal difficulty.
      Beaver Lunch           33.2%      29%              As discussed in Section 1.4 and 1.5 the Bebras prob-
      You wont find it       89.2%      74.2%            lems have been developed to test participants CT skills
      Bowl Factory           18.4%      11.8%            and this compares well to other existing tests like the
      Fireworks              42.2%      47.8%            CT. This is further backed up by our findings from the
      Kangaroo               56.7%      40.3%            undergraduate students as those who had previously
      Spies                  29.6%      17.2%            programmed and who studied Higher Level mathemat-
                                                         ics achieved higher results in Test 1.
between the two groups. This is to be expected              One advantage of this test is it’s ability to be ad-
as the students have been through at least two           ministered both online or through paper question and
more years of education and one of the expressed         answer sheets. However, it is clear that with technol-
goals of the Leaving Certificate is to develop stu-      ogy use in schools becoming more commonplace, online
dents into critical and creative thinker thinkers        submission is preferable. This is also true from a data
(https://www.curriculumonline.ie/getmedia/               collection point of view, as it can save time and ef-
161b0ee4-706c-4a7a-9f5e-7c95669c629f/KS_                 fort as well as provide almost immediate results. The
Framework.pdf), which is all connected to Com-           results presented here were collected via both paper
putational Thinking.    The only question where          answer sheets as well as using Google forms to collect
the secondary school students out-performed the          online responses. This has worked as a stop-gap but a
undergraduates was the Fireworks question.               more robust and controlled system is needed. To that
                                                         end a web-system for the entire CS2Go course, as well
                                                         as the assessment tools described here, has been de-
4.3   Bebras Test 2
                                                         veloped over the past year. It will go live this summer
From Table 19 we can see that, like in Test 1, the un-   and it is hoped this will allow easier access for both
dergraduate students performed better than the sec-      educators and our research group to data and course
ondary school students in Test 2. The gaps this time     content.
are generally lower though, with most being around          One interesting development that we plan to pur-
10%. The secondary school students again performed       sue would be to develop “equivalent” Bebras problems.
better in one question, The Game. This is interesting    Each Bebras exercise is usually based around a specific
as, based on the percentage of students who got the      CS-related concept or problem. To not only make the
question right, and our own analysis discussed in Sec-   test equivalent in difficulty but also topic, we would
tion 3, The Game was identified as being one of the      have to develop Bebras exercise which have the same
underlying concept or idea but with a different story      [GBW13]     L. Gouws, K. Bradshaw, and P. Went-
or real-world application. This is no easy task but if                 worth. October. In First year student
a method could be developed this would not only help                   performance in a test for computational
our test but also allow the Bebras challenge itself to                 thinking, pages 271–277. In Proceedings
develop similar questions year after year.                             of the South African Institute for Com-
   An area of interest in our research group is meth-                  puter Scientists and Information Tech-
ods of predicting success in programming courses. Be-                  nologists Conference . ACM, 2013.
ing able to implement interventions to help students
seen as potential struggling students is vitally impor-    [GCP14]     S. Grover, S. Cooper, and R. Pea. Assess-
tant and beneficial to all educators. If this test could               ing computational learning in k-12. In
be shown to predict success in either programming or                   Proceedings of Conference on Innovation
general academic success it could be a helpful tool for                & technology in computer science educa-
educators. We plan to use the data obtained from the                   tion, pages 57–62, June 2014.
undergraduate students and their final grades to begin     [Gon15]     M. R. Gonzlez. Edulearn15. In Compu-
to see if this is possible.                                            tational thinking test: Design guidelines
                                                                       and content validation., 2015.
6   Acknowledgements
                                                           [HM14]      P. Hubwieser and A. Mhling. 9th work-
Thank you to all of those who submitted rankings.                      shop in primary and secondary comput-
                                                                       ing education (wipsce). In Playing PISA
References                                                             with bebras., 2014.

[AD16]       S. Atmatzidou and S. Demetriadis. Ad-         [HM15]      P. Hubwieser and A. Mhling. Learning
             vancing students’ computational think-                    and teaching in computing and engineer-
             ing skills through educational robotics:                  ing (latice). In Investigating the psycho-
             A study on age and gender relevant dif-                   metric structure of Bebras contest: to-
             ferences. Robotics and Autonomous Sys-                    wards mesuring computational thinking
             tems, 75:661–670, 2016.                                   skills., 2015.
                                                           [LHW16]     W. L. Li, C. F. Hu, and C. C. Wu.
[Bun07]      A. Bundy. Computational thinking is
                                                                       Teaching high school students computa-
             pervasive.   Journal of Scientific and
                                                                       tional thinking with hands-on activities.
             Practical Computing, 1(2):67–69, 2007.
                                                                       In Proceedings of the 2016 ACM Con-
[CN13]       M. E. Caspersen and P. Nowack. Compu-                     ference on Innovation and Technology in
             tational thinking and practice: A generic                 Computer Science Education, pages 371–
             approach to computing in danish high                      371, July 2016.
             schools. In Proceedings of the 15hth Aus-     [LM18a]     J. Lockwood and A. Mooney. Com-
             tralasian Computing Education Confer-                     putational thinking in secondary educa-
             ence, pages 137–143, January 2013.                        tion: Where does it fit? a systematic
                                                                       literary review. International Journal of
[Den09]      Peter J Denning. The profession of it be-
                                                                       Computer Science Education in Schools,
             yond computational thinking. Communi-
                                                                       2018:41–60, January 2018.
             cations of the ACM, 52(6):28–30, 2009.
                                                           [LM18b]     J. Lockwood and A. Mooney. A pi-
[DS16]       V. Dagiene and G. Stupuriene. Informat-                   lot study investigating the introduc-
             ics in education. In Bebras-a sustain-                    tion of a computer-science course focus-
             able community building model for the                     ing on computational thinking at sec-
             concept based learning of informatics and                 ond level. The Irish Journal of Ed-
             computational thinking., 2016.                            ucation/Iris Eireannach an Oideachais,
                                                                       Forthcoming 2018.
[FLM+ 15]    R. Folk, G. Lee, A. Michalenko, A. Peel,
             and E. Pontelli. Gk-12 dissect: Incor-        [MDN+ 14]   A. Mooney, J. Duffin, T. Naughton,
             porating computational thinking with k-                   R. Monahan, J. Power, and P. Maguire.
             12 science without computer access. In                    Pact: An initiative to introduce compu-
             Proccedings Frontiers in Education Con-                   tational thinking to second-level educa-
             ference (FIE), October 2015.                              tion in ireland. In Proceedings of Interna-
            tional Conference on Engaging Pedagogy          royal society of London A: mathemat-
            (ICEP), 2014.                                   ical, physical and engineering sciences,
                                                            366(1881):3717–3725, 2008.
[MLRG15]    Robles G. Moreno-Len, J. and M. Romn-
            Gonzlez. Dr. scratch: Automatic analy-
            sis of scratch projects to assess and foster
            computational thinking. Revista de Ed-
            ucacin a Distancia, 2015.

[Pap80]     Seymour Papert. Mindstorms: Children,
            computers, and powerful ideas. Basic
            Books, Inc., 1980.

[QBM15]     K. Quille, S. Bergin, and A. Mooney.
            Press#, a web-based educational system
            to predict programming performance.
            International Journal of Computer Sci-
            ence and Software Engineering, pages
            178–189, 2015.

[RFP14]     J. F. Roscoe, S. Fearn, and E. Posey.
            Teaching computational thinking by
            playing games and building robots.
            In Proceedings International Interac-
            tive Technologies and Games Conference
            (iTAG), October 2014.

[RGMLR17] Marcos Román-González, Jesús Moreno-
          León, and Gregorio Robles. Complemen-
          tary tools for computational thinking as-
          sessment. In Proceedings of International
          Conference on Computational Thinking
          Education (CTE 2017), S. C Kong, J
          Sheldon, and K. Y Li (Eds.). The Ed-
          ucation University of Hong Kong, pages
          154–159, 2017.

[SS15]      J. Shailaja and R. Sridaran. Compu-
            tational thinking the intellectual think-
            ing for the 21st century. International
            Journal of Advanced Networking & Ap-
            plications Special Issue, 2015:39–46, May
            2015.

[Van14]     Jiřı́ Vanı́ček. Bebras informatics contest:
            criteria for good tasks revised. In In-
            ternational Conference on Informatics in
            Schools: Situation, Evolution, and Per-
            spectives, pages 17–28. Springer, 2014.

[Win06]     Jeannette M Wing.      Computational
            thinking. Communications of the ACM,
            49(3):33–35, 2006.

[Win08]     Jeannette M Wing.       Computational
            thinking and thinking about comput-
            ing. Philosophical transactions of the