Developing a Computational Thinking Test using Bebras problems James Lockwood Dept. of Computer Science Maynooth University, Maynooth, Co. Kildare, Ireland james.lockwood@mu.ie Aidan Mooney Dept. of Computer Science Maynooth University, Maynooth, Co. Kildare, Ireland aidan.mooney@mu.ie 1 Introduction 1.1 Computer Science to Go (CS2Go) Abstract Computer Science to Go (CS2Go) is a course de- Assessment is one of the major factors to con- signed to teach Computer Science topics with a fo- sider when developing a new course or pro- cus on Computational Thinking. The idea to develop gram of study. When developing a course to a course arose from a need identified by our research teach Computer Science there are many forms group working with schools around Ireland, through this could take, one of which is linked to Com- the PACT programme. We observed that teachers putational Thinking. Whilst developing Com- were keenly interested in delivering Computer Science puter Science to Go (CS2Go), an introductory lessons and this led to more schools and teachers join- course aimed at secondary school students, ing the programme. It has been our intention from we have developed a Computational Think- the outset to expand the content on offer and to inves- ing test based on the problems developed for tigate what other topics and methods could be used the international Bebras Challenge. [MDN+ 14]. Due to the fact that there is little in the way of a This paper will describe the content and devel- full course in Computational Thinking there was an opment of the course, as well as some analysis opportunity and a desire to create a more complete on results from a year-long study with sec- and intensive course for Transition Year, with a view ondary school students and first-year under- to developing it into a Junior Certificate short course. graduate students. We believe that, based on In Ireland the second level school system consists of our analysis and previous research in the field, an optional Transition Year (fourth year) which is one- that our assessment, based on pre-exisiting year in length and is taken after the Junior Cycle (first Bebras problems, has the potential to offer to third year) and before the two-year Leaving Certifi- educators another way of testing this increas- cate programme, culminating in a final state exam. ingly discussed skill, Computational Think- In September 2016, teachers who had previously been ing. involved with our group, as well as others including trainee teachers, were asked for their ideas and inputs Copyright c by the paper’s authors. Copying permitted for on course design and content. This feedback, in con- private and academic purposes. junction with input from our group members and an In: A. Piotrkowicz, R. Dent-Spargo, S. Dennerlein, I. Koren, extensive literature review, led to the setting out of P. Antoniou, P. Bailey, T. Treasure-Jones, I. Fronza, C. Pahl (eds.): Joint Proceedings of the CC-TEL 2018 and TACKLE the following aims for a course which are presented in 2018 Workshops, co-located with 13th European Conference no particular order: on Technology Enhanced Learning (EC-TEL 2018), 03-09-2018, published at http://ceur-ws.org • Introduce students to Computer Science, what it is, how it can affect their lives, how they can be 1.3 Computational Thinking involved. Denning [Den09] suggested that Computational • Improve students CT and problem-solving skills Thinking (CT) has been around since the 1950s as by making them aware of a problem-solving pro- algorithmic thinking, referring to the use of an or- cess and how it can be beneficial in many subjects dered precise set of steps to solve a problem and where and areas of life. appropriate to use a computer to do this task. Sey- • Improve students understanding of Computer Sci- mour Papert [Pap80] is credited as concretising CT in ence including an imbalance in participation rates 1980 but it is since the contribution of Jeanette Wing across genders and a stereotyped view of who en- [Win06], who popularised the term and brought it to gages in Computer Science. the international communitys attention, that more and more focus has been placed on CT within education. • Teach students Computer Science concepts such In her seminal paper, Wing outlined how she believed as Algorithms, Cryptography, Sorting/Searching that all children should be taught CT placing it along- Algorithms etc. with a focus not just on the con- side reading, writing and arithmetic in terms of im- cepts themselves but on real-world applications. portance. She further described it as representing a “universally applicable attitude and skill set everyone, • Teach students programming to some level. not just computer scientists, would be eager to learn and use” [Win06]. Students who have participated in PACT courses in Although academics have failed to agree on a uni- the past have commented that the modules had been versal definition of CT, Wing defines it as solving prob- both enjoyable and a good way to develop program- lems, designing systems, and understanding human ming and other skills such as team work. However, behaviour, by drawing on the concepts fundamental they also stated a desire for more practical applica- to computer science. She states that it is not pro- tions and we have been working to ensure that the gramming and that it means “more than being able to topics and methods used in this course reflect their program a computer. It requires thinking at multiple feedback ([MDN+ 14]). The new course has since been levels of abstraction” [Win06]. In 2008 Wing posed a designed and tested and has been well received by both question to the computer science, learning sciences and students and teachers [LM18b]. education communities: “What are effective ways of learning (teaching) CT by (to) children? [Win08]. This 1.2 Goals of the Test in turn raised further questions about what concepts Assessment is one of the key factors when designing to teach, the order in which these might be taught, and developing courses for any level of education. One and which tools should be used to teach them. of the areas that was needed to analyse the success In the meantime, a lot of work has been done and impact of CS2Go was to find or develop a Com- around the world and across all levels of education putational Thinking test. It had to fit the following to introduce CT into schools, colleges, after school requirements: clubs, mainly through Computer Science or comput- ing classes/courses. As CT is important to a com- • Be applicable to the target age range (15-17 years puter scientist this makes sense; however, it should be old). noted that being able to think computationally, which includes skills such as decomposition, abstraction, al- • Allow for differentiation between strong and gorithmic thinking and pattern matching, can be of weaker students (i.e. have harder and easier ques- benefit to all disciplines. [Bun07] has made this point tions). stating that CT concepts have been used in other dis- ciplines and that the ability to think computationally • Allow students to complete the questions without is essential to every discipline. any prior knowledge. A wide array of topics has been used to intro- • Be completed within a 40-minute class time. duce CT to students. In addition to explicitly teach- ing students what CT is [GCP14, LHW16] students • Allow for a pre- and post-test of similar difficulty may be introduced to concepts such as abstraction and content. [AD16, SS15], modelling [CN13], Algorithms [AD16, FLM+ 15, MDN+ 14], decomposition [AD16] and prob- • Test students Computational Thinking skills. lem solving/critical thinking skills [RFP14, SS15]. 1.4 CT Assessment Cryptography, Trees etc. and this allows them to be used to introduce students to these topics without stu- Assessment of CT is in it’s infancy and as such, there dents even realising they are learning them. aren’t many methods for educators to test what is be- The fact that the Bebras problems are designed to ing described more as a central skill for students to test Computational Thinking skills means they are possess. well suited to test students Computational Think- Of note is one effort to develop a Computational ing skills before and after the course. Gouws et al. Thinking test called the Computational Thinking Test [GBW13] previously used the South African version (CTt) and another project called Dr. Scratch. Dr of Bebras in a similar manner and it was this that Scratch anaylses Scratch projects to deliver a CT score inspired the development of our own Computational based on a number of different metrics [MLRG15]. Thinking test. Other studies have also been carried This is a great tool and we recommend it as a tool out on the Bebras problems to investigate both their to analyse Scratch projects developed in one module effectiveness and to compare them to other Computa- of CS2Go. As it works exclusively with Scratch, this tional Thinking tests [HM15, Van14, DS16, HM14]. didn’t suit our purposes to study students “general” CT skills pre- and post-course. The CTt test has been developed as a series of multiple-choice questions that 2 Methodology are presented online in either a “maze” or “canvas” The current format of the Bebras challenge doesn’t interface. There a number of factors which define the suit as a comparative test as the questions change each questions [Gon15]. The group have analysed these two year. The challenge is often conducted on PC’s and we metrics (CTt and Dr Scratch) alongside the Bebras wanted to allow teachers to do it through either pen problems [RGMLR17]. They found that CTt was par- and paper or online if desired. It was decided that 13 tially convergent with the other two and claim this is questions would be used in each test, with students to be expected as the three assess CT but from differ- allowed 35 minutes to complete them. This considers ent perspectives. They claim that CTt has a strength both the 3-minute design of the question as well as that it can be done in “pure pre-test conditions”. This the fact that some of the questions are designed for can allow early detection of problems but also doesn’t a younger age group than the target demographic. It allow for contextualised assessment. This is a strength was hoped that each test would be as close as possible of the Bebras problems, which has “real-life” questions to each other in terms of difficulty level as well as ques- but they also claim the “psychometric properties of tion topic and type. To do this many questions from some of the problems are still far off being demon- Bebras challenges across the world were examined and strated”. critiqued. With this being said, we felt that, from assessing The questions used in the UK challenges were various forms of assessment for Computational Think- deemed most appropriate and the contents of the test ing that exist, both through a systematic literature were sources from the 2015 and 2016 challenges. For review [LM18a] and through interactions with other the target age group (15-17-year olds) the UK chal- researchers and educators it was decided to develop lenge involves 18 multiple-choice questions over 40 a test based on the Bebras competition problem for minutes. As explained previously this was adjusted CS2Go. slightly for our purposes to be shorter but also allowed for some non-multiple-choice questions as well. The 1.5 Bebras Problems first criteria for the tests was to ensure that they were Bebras is an international competition which aims to as close in terms of difficulty level as possible. The promote Computer Science and computational think- UK Bebras challenge is broken into six age groups as ing among school students at all ages. Participants presented in Table 1. are usually supervised by teachers and the challenge is performed at schools using computers or mobile de- Table 1: Bebras UK Sections vices. As part of their work in schools the PACT group Group Name Year Group Approx. age are involved in the Irish version of this test and have Kits 2 &3 6-8 designed and used Bebras problems in order to pro- Castors 4&5 8-10 vide teachers with resources to introduce students to Juniors 6&7 10-12 Computational Thinking. They are designed to be 3- Intermediate 8&9 12-14 minute-long questions and require no prior knowledge Seniors 10 & 11 14-16 of programming or Computer Science topics. All the Elites 12 & 13 16-18 problems are linked to topics in Computing such as Each age group is then further divided into three Table 3: Topics of the questions Sections, namely, Section A, Section B and Section C. Questions in Section A are considered the easiest with Test 1 Topic Test 2 Topic Section C problems being the more complex. Ques- tions that are submitted for the Bebras problem are Bracelet Pattern Bebras Algorithms reviewed by a panel of experts in Computing education Matching Painting who are involved in the Bebras challenge. Questions Animation Attributes Bottles Sorting that are accepted for either the qualification rounds, and Vari- or the final challenge are often used in multiple age ables groups and across the three Sections. Animal Data or- Party Graphs Competi- dering guests To ensure that each created test was as similar in tion difficulty as possible these ratings were used to select Stack Stacks Pirate Graphs questions for each test, ensuring that corresponding computer Hunters problems were used in at least one common section and Throw If-then- Magic po- Logic & age group. The chosen corresponding problems for the the dice else tion binary tests along with the sections they have in common can Drawing Objects Concurrent Parallel be seen in Table 2. For the complete set of problems stars directions instruc- consult goo.gl/XDRHbq. tions Beaver Trees Theatre Sequences Table 2: Matching sections of the tests lunch You won’t Ciphering Secret Ciphering find it messages Test 1 Ques- Common Sections Test 2 Ques- Bowl Fac- Sorting Triangles Iterative, tion tion tory pattern Bracelet Kits B, Castors A Bebras Paint- matching ing Fireworks Encoding Scanner Pixels Animation Castors B, Juniors Bottles code A Spies Gossip B-enigma Encrypting Animal Com- Castors B, Inter- Party guests Problem petition mediate A Cross Country Intermediate A Tube System Stack com- Senior B, Elite A Pirate tested by a small group to ensure that the questions puter Hunters were clear, made sense, that our timing (35 minutes) Throw the Juniors C Magic potion was reasonable, and that both sets of questions ap- dice peared similar in terms of difficulty. The group found Drawing stars Intermediate B Concurrent that the second test was perhaps slightly harder, but directions that for 35 minutes it was doable and that the ques- Beaver lunch Senior B Theatre tions were clear in general. You won’t find Intermediate C, Secret mes- To further assess that the two tests are similar in it Elite A sages difficulty and validate their effectiveness the questions Bowl Factory Intermediate C, Triangles were sent out to teachers, undergraduate and post- Elite B graduate students and third level academic staff with Fireworks Senior C Scanner code instructions of how to rate the questions difficulty. The Kangaroo Elite C The Game hope was that this sample of different demographic Spies Elite C B-enigma and career groups would show not only that the two tests are similar in difficulty but allow us to weigh ei- ther specific questions or one of the tests accordingly The second criteria for the tests was to have similar if there was a discrepancy. Tables 4, 5 and 6 presents topics and styles for the questions where possible, and the qualifications and areas of work of the participants. to have these topics relating to areas covered in the There was a mixture of genders and ages but this data course. This was not as much a priority as the diffi- was not collected, this group will be referred to as the culty, so questions were considered even if this wasnt panel from now on. possible. Table 3 presents the topics covered by each We asked the panel to rank the questions for us on question for each test. two scales. Twenty people completed this task for Test Prior to either of the tests being used, they were 1, with 18 of those also completing it for Test 2. The questions are presented from easiest to hardest based Table 4: Qualification profile of the panel on the average scores in the table. Highest Qualification No. of participants Table 7: Rating of Test 1 Questions out of 20 PhD 5 Masters 1 Rank Question Average out of 20 Bachelors Degree 10 1 Bracelet 3.2 Leaving Certificate 3 2 Animation 5.6 Unspecified 1 3 Cross Country 6.15 4 Beaver Lunch 7.6 5 Drawing Stars 7.6 Table 5: Job profile of the panel 6 Throw the Dice 7.7 7 You Wont Find It 8.75 Job Title No. of participants 8 Animal Competition 8.95 Lecturer 5 9 Kangaroo 9.05 Primary school teacher 2 10 Fireworks 9.45 Secondary school teacher 1 11 Bowl Factory 12.6 Tutor/Postgraduate Student 5 12 Stack Computer 13.5 Youth worker 2 13 Spies 15.1 Nurse/Veterinary Nurse 2 Average 8.87 Undergraduate Student 2 Unspecified 1 There isn’t much of a difference between this rank- ing and the ranking participants gave the questions out of 13. This is to be expected, and the two rankings are Table 6: Area of work of the panel presented in Table 8 to show this comparison. Area of work No. of participants Table 8: Test 1 Comparison Computer Science 9 Irish 1 Mathematics 1 Question 1-13 Rank Question 1-20 Electronic Engineering 1 Bracelet 1 Bracelet Youth work 2 Animation 2 Animation Medicine 2 Cross Country 3 Cross Country Teaching 3 Throw the Dice 4 Beaver Lunch Unspecified 1 Drawing Stars 5 Drawing Stars Beaver Lunch 6 Throw the Dice You Wont Find It 7 You Wont Find It first scale was rating the questions in each tests from Kangaroo 8 Animal Competition easiest to hardest, this gave each question a ranking Animal Competition 9 Kangaroo from 1 to 13. To further enhance this ranking a second Fireworks 10 Fireworks scale was needed, as some questions might be classified Stack Computer 11 Bowl Factory as being the easiest two, but there could be a big gap Bowl Factory 12 Stack Computer in difficulty between them. The same could be true of Spies 13 Spies any two questions. Since each test had 13 questions it was decided that a scale from 1-10 wouldn’t allow the panel to be clear and would in fact limit the ranking. It should be noted that Beaver Lunch, Drawing A scale of 1-20 was decided upon, with 1 being easiest Stars and Throw the Dice were rated as having al- and 20 was hardest. The panel weren’t given further most exactly the same level of difficulty, with scores instruction unless it was requested, they were free to of 7.6, 7.6 and 7.7 respectively (see Table 7) out of rank the questions as they saw fit. 20. It should also be noted that there is a large jump in difficulty from 10th to 11th position (Fireworks to Bowl Factory). Kangaroo and Fireworks are rated 3 Results 9.05 and 9.45 respectively, but the scores then jump When asked to rate the questions on a scale from 1- significantly to 12.6, 13.5 and 15.1 for Bowl Factory, 20, Table 7 presents the scores for each question. The Stack Computer and Spies. A similar gap can be seen when going from the first three questions to the 4th question. Bracelet, Animation and Cross Country are rated as 3.2, 5.6 and 6.15 respectively, which in of it- self covers a broad range. The score then jumps up to 7.6 for Beaver Lunch and Drawing Stars. Table 9: Test 1 Extensive Ranking This lines up roughly with the age categories ques- tions were used in during the Bebras competition. Ta- ble 9 presents a comparison between these three or- Col 1 Col 2 Col 3 derings. For the original category and UK results we Rank Ranking Scores Bebras Category have used the percentage in the highest category they from 1-13 from 1-20 (Highest) were entered in, which can be seen in the table. 1 Bracelet Bracelet Bracelet (Inter A) If we use the rankings in each of these columns we (1.75) (3.2) can attempt to rank the questions across all three 2 Animation Animation Animation (Inter columns to give an overall ranking. For example, (4.8) (5.6) A) Bracelet was ranked 1 in Column 1 and 1 in Column 3 Cross Cross Animal Competi- 2, giving a score of 2 (if we simply add these numbers Country Country tion (Inter A) together). If scores are identical in any of the columns (4.95) (6.15) then they will be given the same score e.g. in Column 4 Throw Beaver Cross Country 2 Beaver Lunch and Drawing stars have a score of 7.6 the Dice Lunch (Inter A) so theyll both be given a value of 5 (i.e. the highest (5.7) (7.6) ranked question of the two). 5 Drawing Drawing Beavers Lunch In Column 3 a score will be given relating to the po- Stars Stars (Senior B) sition of the highest question; for example, You Won’t (6.3) (7.6) Find It, Stack Computer and Drawing Stars were all 6 Beaver Throw Throw the Dice used in Elite A Category, so they will all be given a Lunch the Dice (Senior B) value of 10 as Drawing Stars is the highest placed in (6.65) (7.7) the list. 7 You You Fireworks (Senior Doing this for each question we can then rank them Wont Wont C) from 1 to 13, with 1 being the easiest question (the Find It Find It lowest total across all four columns) and 13 being (7.15) (8.75) the hardest (the largest total across all four columns). 8 Kangaroo Animal You Wont Find It This ranking is shown in Table 10. (7.45) Com- (Elite A) The rankings shown in Table 10 can allow us to petition weigh questions by awarding a higher mark for getting (8.95) a correct answer on harder questions and lower marks 9 Animal Kangaroo Stack Computer for a correct answer on easier questions. This hasn’t Com- (9.05) (Elite A) been deemed necessary at this stage but further results petition might lead us to do this especially with hard questions (7.6) such as Spies. 10 Fireworks Fireworks Drawing Stars What we can deduce from this rankings is that the (7.95) (9.45) (Elite A) questions can be split into 3 different difficulty levels. 11 Stack Bowl Bowl Factory Questions in Rank 1-4 all have a score of less than 15, Com- Factory (Elite B) they can be seen as the easiest four questions. Ques- puter (12.6) tions 5-10 have a ranking of between 15-30 and can (9.6) be seen as intermediate questions and Questions 11-13 12 Bowl Stack Kangaroo (Elite have rankings of over 30, and they can be seen as the Factory Com- C) hardest questions. (9.85) puter (13.5) 3.1 Test 2 13 Spies Spies Spies (Elite C)/ (11.25) (15.1) When asked to rate the questions on a scale from 1- 20, Table 11 presents the scores for each question as rated by the testers. The questions are presented from easiest to hardest. Unlike in Test 1 there appears to be more of a difference between this ranking and the Table 10: Test 1 Ranking Table 12: Test 2 Comparison Rank Question Total Breakdown* Question 1-13 Rank Question 1-20 Score Bebras Painting 1 Bebras Painting 1 Bracelet 6 1+1+4 Tube System 2 Tube System 2 Animation 8 2+2+4 Magic Potion 3 Concurrent Direc- 3 Cross Country 10 3+3+4 tions 4 Throw the Dice 14 4+6+6 Party Guest 4 Magic Potion 5 Beaver Lunch 17 6+5+6 Bottles 5 Theatre 6 Drawing Stars 20 5+5+10 Theatre 6 Bottles 7 Animal Compe- 21 9+8+4 Concurrent Direc- 7 Party Guest tition tions 8 You Won’t Find 24 7+7+10 Secret Messages 8 Secret Messages It Scanner Code 9 Triangles 9 Fireworks 27 10+10+7 B-Enigma 10 Scanner Code 10 Kangaroo 30 8+9+13 Triangles 11 B-Enigma 11 Stack Computer 33 11+12+10 The Game 12 The Game Bowl Factory 33 12+11+10 Pirate Hunters 13 Pirate Hunters 13 Spies 39 13+13+13 *Rank of (col1 + col2 + col3 from ) from Table 9 position. This is what leads to the slight mismatch in those positions. Similarly, there is only one point sepa- ratings given from 1-13, as shown in Table 12. rating Triangles (10.89) in ninth place with The Game (11.78) in 12th place. Also, of interest is the fact that Bottles and Party Guest (both 8.44) and The Game Table 11: Rating of Test 2 Questions out of 20 and B-Enigma (both 11.78) were rated with the same level of difficulty. Rank Question Average out of 20 These rankings line up roughly with the age cate- 1 Bebras Painting 3.72 gories questions were used in during the Bebras compe- 2 Tube System 5.94 tition. Table 13 presents a comparison between these 3 Concurrent Directions 7.72 three orderings. For the original category and UK re- 4 Magic Potion 8 sults we have used the percentage in the highest cat- 5 Theatre 8.06 egory they were entered in, which can be seen in the 6 Bottles 8.44 table. 7 Party Guest 8.44 If we use the rankings in each of these columns we 8 Secret Messages 8.78 can rank the questions across all three columns to give 9 Triangles 10.89 an overall ranking. For example, Bebras Painting was 10 Scanner Code 11.56 ranking 1 in Column 1 and 1 in Column 2, giving a 11 B-Enigma 11.78 score of 2, when added together. If scores are identical 12 The Game 11.78 in any of the columns then they will be given the same 13 Pirate Hunters 13.44 score e.g. in column two B-engima and The Game Average 9.12 have the same score, so theyll both be given a value of 12 (i.e. the highest ranked question of the two). In Column 3 the score will be given of the highest It is interesting to note that with this test it ap- question, for example Theatre, Scanner Code and Tri- pears there are a few more discrepancies between the angles were all used in Elite B, so they will all be given two rankings. From questions 3-11 there are several a value of 11 as Triangles is the highest placed in the questions “out of place”, some just one rank (like The- list. atre and Bottles in ranks five and six) or in the case Doing this for each question we can then rank them of Concurrent Directions and Party Guest, differ by from 1 to 13, with 1 being the easiest question (the three or four ranks. The reason for this is that these lowest total across all four columns) and 13 being questions were all rated very similarly by most people. the hardest (the largest total across all four columns). In terms of the difficulty score (from 1-20) there is only This ranking is shown in Table 14. one point separating Concurrent Directions (7.72) in As with Test 1 the rankings shown in Table 10 can third position, and Secret Messages (8.78) in eighth allow us to weigh questions by awarding a higher mark Table 14: Test 2 Ranking Rank Question Total Breakdown* Score 1 Bebras Painting 3 1+1+1 Table 13: Test 2 Extensive Ranking 2 Tube System 8 2+2+4 3 Bottles 14 5+7+2 Column 1 Column 2 Column 3 4 Party Guest 15 4+7+4 Rank Our Our Analysis Bebras Cat- Magic Potion 15 3+4+8 Analysis (scores from egory (High- Concurrent 15 7+3+5 (Ranking 1-20) est) Directions from 1-13) 7 Theatre 22 6+5+11 1 Bebras Bebras Bebras 8 Secret Messages 24 8+8+8 Painting Painting Painting 9 Scanner Code 30 9+10+11 (2.22) (3.72) (Castors A) 10 Triangles 31 11+9+11 2 Tube Sys- Tube System Bottles (Ju- 11 Pirate Hunters 34 13+13+8 tem (4.33) (5.94) nior A) 12 B-enigma 35 10+12+13 3 Magic Po- Concurrent Party Guests 13 The Game 37 12+12+13 tion (5.39) Directions (Inter A) (7.72) *Rank of (col1 + col2 + col3) from Table 13 4 Party Magic Po- Tube System for getting a correct answer on harder questions and Guest tion (8) (Inter A) lower marks for correct answer on easier questions. (5.78) This hasn’t been deemed necessary at this stage but 5 Bottles Theatre Concurrent further results might lead us to do this with especially (6.22) (8.06) Directions hard questions such as The Game. (Senior A) Similarly to Test 1 we can deduce from this rankings 6 Theatre Bottles Pirate that the questions can be split into 3 different difficulty (6.5) (8.44) Hunters levels. Questions in Rank 1-3 all have a score of less (Elite A) than 15, they can be seen as the easiest four questions. 7 Concurrent Party Guest Magic Po- Questions 4-9 have a ranking between 15-30 and can Directions (8.44) tion (Elite be seen as intermediate questions and Questions 10- (6.78) A) 13 have rankings of over 30, they can be seen as the 8 Secret Secret Mes- Secret Mes- hardest questions. These divisions are similar to the Messages sages (8.78) sages (Elite divisions shown in Test 1. (7) A) 9 Scanner Triangles Theatre Code (10.89) (Elite B) 4 Findings (8.89) Using the average of the difficulty of the two tests as 10 B-enigma Scanner Scanner presented in Table 7 and Table 11 it can be seen that (8.94) Code (11.56) Code (Elite both tests are of a similar difficulty level. Test 1 ques- B) tions were rated on average at a perceived difficulty of 11 Triangles B-enigma Triangles 8.87 and Test 2 questions were rated on average at a (9) (11.78) (Elite B) perceived difficulty of 9.12. This leads us to be able 12 The Game The Game The Game to conclude that the two tests have a similar difficulty (9.89) (11.78) (Elite C) rating. 13 Pirate Pirate B-enigma These tests have been run over the course of the Hunters Hunters (Elite C) 2017-18 academic year in a number of schools as well (9.94) (13.44) as on a first year undergraduate CS course. It was run in schools as part of the wider CS2Go roll-out and it was decided that running it with the undergraduate students would be helpful as they are a larger, more consistent sample. It was also felt that students could benefit from the problem solving aspect of the assess- ment. With both cohorts it was hoped to see if the test also found across the whole population with the aver- could be completed in the 35 minute time-period al- ages being 7.689 for Test 1 (n=263) and 7.933 for Test lotted. It was also hoped that it could be seen from re- 2 (n=180) (T-score = 1.17, P-value = 0.24). sults that the test targets the students Computational Thinking skills. This is hard to really define but we Table 16: Undergraduate results of the tests, where n used students previous mathematics and programming is the number of students taking the test experience as metrics to compare groups. Mathemati- cal ability has been shown to be a predictor of success in programming [QBM15] and most would agree that Average of those who Average of those programming is a specific way of testing CT skills. took at least one test who took both tests Test 1 7.689 (n = 263) 7.988 (n = 174) 4.1 Overall Results Test 2 7.933 (n=180) 8.03 (n = 174) 4.1.1 School data These scores are all out of 13 A total of 200 took at least one of the problem-solving As stated in Section 1.2, one of the hopes of these tests. Of those 200 students, 187 took Test 1 and studies was to show that the Bebras problems chal- 76 took Test 2. The decrease in number is due to lenge students Computational Thinking skills. For some schools not completing all of the feedback and this we have looked at students who had previous pro- assessment at the end of the year. This could have gramming experience and those who took Higher Level been due to their teachers not using the content much Mathematics at the Leaving Certificate. or not having time to re-test the students. Table 17 shows that students who took Higher Level Table 15 shows the results of the tests grouped by maths performed significantly better in both Test 1 (T- those to two at least one test and those who took both score = 2.768 P-value=0.006) and Test 2 (T-score = tests. It can be seen that students in both groups per- 3.409 P-value = 0.001). This is encouraging as mathe- formed slightly better in the second test than the first matical ability and Computational Thinking can be test. For the whole population this is to a significant seen as closely related skill-sets. Also interestingly level (T-score = 2.473, P-value = 0.014) but for those those who studied Ordinary Level Mathematics per- who took both tests there is no significant difference formed slightly worse in Test 2 than Test 1, whereas (T-score = 0.159, P-value = 0.873). those who studied Higher Level increased slightly. It should be noted that neither groups scores changed Table 15: School results of the tests, where n is the significantly over the two tests. number of students taking the test It can also be seen that those who had previous pro- gramming experience performed better in Test 1 than those who had no experience, but not to a significant Average of those who Average of those level (T-score = 0.853 P-value = 0.395). Interestingly, took at least one test who took both tests not only was the numerical gap closed in this demo- Test 1 5.806 (n = 187) 6.527 (n = 55) graphic by Test 2, but had swung the other way, with Test 2 6.627 (n = 76) 6.6 (n = 55) those who had no previous experience out-performing their peers, although neither groups scores changed to These scores are all out of 13 a significant level. This is encouraging as it could help to show that the content covered by introductory CS 4.1.2 Undergraduate Students courses, namely programming and low-level theory, are beneficial for Computational Thinking skills. A total of 292 students took at least one of the One way which we can compare the two difference problem-solving tests. Of those 292 students, 263 took cohorts is by looking at the percentage of students who Test 1 and 180 took Test 2. The decrease in numbers got each question right. We would expect the under- is due to students changing course, only needing to graduate students to perform better in most, if not all, complete one semester of CS and other unrelated cir- questions across both Tests. cumstances. Table 16 shows the results of the tests grouped by 4.2 Bebras Test 1 those to two at least one test and those who took both tests. It can be seen that a total of 174 took both tests. It can be seen from Table 18 that in Test 1 the Students performed marginally better in Test 2 com- undergraduate students performed better than pared to Test 1, but this isn’t a significant difference those in secondary school in almost every question. (T-score = 0.129, P-value = 0.897). This increase is Many of the ranges in scores are from 15-30% hardest questions across both tests. Table 17: Undergraduate Demographic comparisons Table 19: Our Results Test 2 Demographic Test 1 Avg Test 2 Avg Studied OL (n = 44) 7.386 7.273 Question Undergrad School Studied HL (n = 110) 8.445 8.509 (n=197) (n=75) PP (n = 71) 8.225 8.085 Bebras Painting 81.7% 58.7% NPP (n = 88) 7.92 8.102 Bottles 90.4% 84% OL = Ordinary Level Mathematics Party Guests 88.3% 70.7% HL = Higher Level Mathematics Tube System 67.5% 57.3% PP = Previous Programming Experience Pirate Hunters 42.1% 33.3% NPP = No previous programming experience Magic Potion 83.2% 69.3% Concurrent Directions 78.7% 66.7% Theatre 32.5% 21.3% Table 18: Our Results Test 1 Secret Messages 86.8% 86.7% Triangles 47.2% 34.7% Question Undergrads Schools Scanner Code 36.5% 32% (n=277) (n=186) The Game 3.6% 5.3% Bracelet 94.6% 94.1% B-Enigma 51.3% 42.7% Animation 70.8% 56.5% Animal Competition 64.9% 39.8% Cross Country 68.6% 45.2% Stack Computer 44% 13.9% 5 Conclusions Throw the Dice 77.9% 50.5% Based on the analysis from our panel we can conclude Drawing stars 82.3% 60.2% that the two tests are of approximately equal difficulty. Beaver Lunch 33.2% 29% As discussed in Section 1.4 and 1.5 the Bebras prob- You wont find it 89.2% 74.2% lems have been developed to test participants CT skills Bowl Factory 18.4% 11.8% and this compares well to other existing tests like the Fireworks 42.2% 47.8% CT. This is further backed up by our findings from the Kangaroo 56.7% 40.3% undergraduate students as those who had previously Spies 29.6% 17.2% programmed and who studied Higher Level mathemat- ics achieved higher results in Test 1. between the two groups. This is to be expected One advantage of this test is it’s ability to be ad- as the students have been through at least two ministered both online or through paper question and more years of education and one of the expressed answer sheets. However, it is clear that with technol- goals of the Leaving Certificate is to develop stu- ogy use in schools becoming more commonplace, online dents into critical and creative thinker thinkers submission is preferable. This is also true from a data (https://www.curriculumonline.ie/getmedia/ collection point of view, as it can save time and ef- 161b0ee4-706c-4a7a-9f5e-7c95669c629f/KS_ fort as well as provide almost immediate results. The Framework.pdf), which is all connected to Com- results presented here were collected via both paper putational Thinking. The only question where answer sheets as well as using Google forms to collect the secondary school students out-performed the online responses. This has worked as a stop-gap but a undergraduates was the Fireworks question. more robust and controlled system is needed. To that end a web-system for the entire CS2Go course, as well as the assessment tools described here, has been de- 4.3 Bebras Test 2 veloped over the past year. It will go live this summer From Table 19 we can see that, like in Test 1, the un- and it is hoped this will allow easier access for both dergraduate students performed better than the sec- educators and our research group to data and course ondary school students in Test 2. The gaps this time content. are generally lower though, with most being around One interesting development that we plan to pur- 10%. The secondary school students again performed sue would be to develop “equivalent” Bebras problems. better in one question, The Game. This is interesting Each Bebras exercise is usually based around a specific as, based on the percentage of students who got the CS-related concept or problem. To not only make the question right, and our own analysis discussed in Sec- test equivalent in difficulty but also topic, we would tion 3, The Game was identified as being one of the have to develop Bebras exercise which have the same underlying concept or idea but with a different story [GBW13] L. Gouws, K. Bradshaw, and P. Went- or real-world application. This is no easy task but if worth. October. In First year student a method could be developed this would not only help performance in a test for computational our test but also allow the Bebras challenge itself to thinking, pages 271–277. In Proceedings develop similar questions year after year. of the South African Institute for Com- An area of interest in our research group is meth- puter Scientists and Information Tech- ods of predicting success in programming courses. Be- nologists Conference . ACM, 2013. ing able to implement interventions to help students seen as potential struggling students is vitally impor- [GCP14] S. Grover, S. Cooper, and R. Pea. Assess- tant and beneficial to all educators. If this test could ing computational learning in k-12. In be shown to predict success in either programming or Proceedings of Conference on Innovation general academic success it could be a helpful tool for & technology in computer science educa- educators. We plan to use the data obtained from the tion, pages 57–62, June 2014. undergraduate students and their final grades to begin [Gon15] M. R. Gonzlez. Edulearn15. In Compu- to see if this is possible. tational thinking test: Design guidelines and content validation., 2015. 6 Acknowledgements [HM14] P. Hubwieser and A. Mhling. 9th work- Thank you to all of those who submitted rankings. shop in primary and secondary comput- ing education (wipsce). In Playing PISA References with bebras., 2014. [AD16] S. Atmatzidou and S. Demetriadis. Ad- [HM15] P. Hubwieser and A. Mhling. Learning vancing students’ computational think- and teaching in computing and engineer- ing skills through educational robotics: ing (latice). In Investigating the psycho- A study on age and gender relevant dif- metric structure of Bebras contest: to- ferences. Robotics and Autonomous Sys- wards mesuring computational thinking tems, 75:661–670, 2016. skills., 2015. [LHW16] W. L. Li, C. F. Hu, and C. C. Wu. [Bun07] A. Bundy. Computational thinking is Teaching high school students computa- pervasive. Journal of Scientific and tional thinking with hands-on activities. Practical Computing, 1(2):67–69, 2007. In Proceedings of the 2016 ACM Con- [CN13] M. E. Caspersen and P. Nowack. Compu- ference on Innovation and Technology in tational thinking and practice: A generic Computer Science Education, pages 371– approach to computing in danish high 371, July 2016. schools. In Proceedings of the 15hth Aus- [LM18a] J. Lockwood and A. Mooney. Com- tralasian Computing Education Confer- putational thinking in secondary educa- ence, pages 137–143, January 2013. tion: Where does it fit? a systematic literary review. International Journal of [Den09] Peter J Denning. The profession of it be- Computer Science Education in Schools, yond computational thinking. Communi- 2018:41–60, January 2018. cations of the ACM, 52(6):28–30, 2009. [LM18b] J. Lockwood and A. Mooney. A pi- [DS16] V. Dagiene and G. Stupuriene. Informat- lot study investigating the introduc- ics in education. In Bebras-a sustain- tion of a computer-science course focus- able community building model for the ing on computational thinking at sec- concept based learning of informatics and ond level. The Irish Journal of Ed- computational thinking., 2016. ucation/Iris Eireannach an Oideachais, Forthcoming 2018. [FLM+ 15] R. Folk, G. Lee, A. Michalenko, A. Peel, and E. Pontelli. Gk-12 dissect: Incor- [MDN+ 14] A. Mooney, J. Duffin, T. Naughton, porating computational thinking with k- R. Monahan, J. Power, and P. Maguire. 12 science without computer access. In Pact: An initiative to introduce compu- Proccedings Frontiers in Education Con- tational thinking to second-level educa- ference (FIE), October 2015. tion in ireland. In Proceedings of Interna- tional Conference on Engaging Pedagogy royal society of London A: mathemat- (ICEP), 2014. ical, physical and engineering sciences, 366(1881):3717–3725, 2008. [MLRG15] Robles G. Moreno-Len, J. and M. Romn- Gonzlez. Dr. scratch: Automatic analy- sis of scratch projects to assess and foster computational thinking. Revista de Ed- ucacin a Distancia, 2015. [Pap80] Seymour Papert. Mindstorms: Children, computers, and powerful ideas. Basic Books, Inc., 1980. [QBM15] K. Quille, S. Bergin, and A. Mooney. Press#, a web-based educational system to predict programming performance. International Journal of Computer Sci- ence and Software Engineering, pages 178–189, 2015. [RFP14] J. F. Roscoe, S. Fearn, and E. Posey. Teaching computational thinking by playing games and building robots. In Proceedings International Interac- tive Technologies and Games Conference (iTAG), October 2014. [RGMLR17] Marcos Román-González, Jesús Moreno- León, and Gregorio Robles. Complemen- tary tools for computational thinking as- sessment. In Proceedings of International Conference on Computational Thinking Education (CTE 2017), S. C Kong, J Sheldon, and K. Y Li (Eds.). The Ed- ucation University of Hong Kong, pages 154–159, 2017. [SS15] J. Shailaja and R. Sridaran. Compu- tational thinking the intellectual think- ing for the 21st century. International Journal of Advanced Networking & Ap- plications Special Issue, 2015:39–46, May 2015. [Van14] Jiřı́ Vanı́ček. Bebras informatics contest: criteria for good tasks revised. In In- ternational Conference on Informatics in Schools: Situation, Evolution, and Per- spectives, pages 17–28. Springer, 2014. [Win06] Jeannette M Wing. Computational thinking. Communications of the ACM, 49(3):33–35, 2006. [Win08] Jeannette M Wing. Computational thinking and thinking about comput- ing. Philosophical transactions of the