=Paper=
{{Paper
|id=Vol-1590/paper-04
|storemode=property
|title=Simple Metrics for Curriculum Analytics
|pdfUrl=https://ceur-ws.org/Vol-1590/paper-04.pdf
|volume=Vol-1590
|authors=Xavier Ochoa
|dblpUrl=https://dblp.org/rec/conf/lak/Ochoa16
}}
==Simple Metrics for Curriculum Analytics==
Simple Metrics for Curricular Analytics Xavier Ochoa Escuela Superior Politécnica del Litoral Vía Perimetral, Km. 30.5 Guayaquil, Ecuador xavier@cti.espol.edu.ec ABSTRACT There are several sources of information that can be used The analysis of a program curriculum is traditionally a very to analyze a program curriculum. The first main categoriza- subjective task. Perceptions and anecdotes, faculty pref- tion of this information responds to its level of objectivity. erences and content or objectives check-lists are the main Surveys about needs, perceptions and sentiments are a com- sources of information to undergo the revision of the struc- mon tool in curricula analysis. These surveys can be directed ture of a program. This works proposes a list of simple met- to students [6], faculty [7], alumni [3] or the labor market [5]. rics, that can be easily extracted from readily available aca- The result of these surveys provide subjective information. demic data that contains the information about the actual On the other hand, curriculum analysis could also employ interactions of students with the curriculum. These metrics, factual data obtained from the curriculum and its usage. divided into time- and performance-related, are calculated This data can be classified as objective information. at program level. The use of these metrics provides objective The objective information could be further classified in information in which to base discussions about the current three main groups: state and efficiency of the curriculum. To exemplify the • Intrinsic: This is the information that is contained in feasibility and usefulness of the metrics, this work presents the curriculum itself. For example, Sekiya et al., used some illustrative analysis that make use of the simple cur- the descriptions provided in the syllabi of several Com- riculum metrics. puter Science curricula to compare their compliance to with the Computer Science Curriculum recommenda- CCS Concepts tion from ACM [9]. •Applied computing ! Education; • Extrinsic: This is the information external to the pro- gram that influence its content or structure. For exam- Keywords ple, Sugar et al., [10] found required multimedia pro- Learning Analytics, Curriculum Analytics duction competencies for instructional designers, com- piling information from instructional design job adver- tisements. 1. INTRODUCTION Learning Analytics has been traditionally applied to un- • Interaction: This is information that is generated when derstand and optimize the learning process at course level. the students interact with the curriculum. The most The learning process is analyzed through the captured in- common interactive information is the course selection teractions between students and instructor, content or tools. and the grades obtained by students. This information However, Learning Analytics are not restricted to act at is commonly refereed as student academic records. For this level. Adapted techniques, applied to di↵erent sources example, Bendatu and Yahya [1], inspired by the cur- of information, could be used to understand and optimize riculum mining idea of Pechenizkiy et al. [8], use stu- learning at the program level, as exemplified by the works dent records to extract information about the course- of Pechenizkiy et al. [8] and Gonzalo et al. [7]. Due to taking behavior of students. the interconnection between learning at the course and the program level, Program Analytics are an indispensable com- From all this sources of data, this work will concentrate plement to traditional Learning Analytics in order to have in the curriculum interaction data for three main reasons: an e↵ective end-to-end learning process. First, it is automatically captured and readily available to any running program. Second, contrary to the intrinsic in- formation, academic records are easier to analyze and un- derstand. And finally, the relative uniformity in which this information is represented and stored make it an ideal target for analysis techniques that can be used between programs and institutions. The structure of this paper is as follows: Section two pro- poses an initial list of useful and easy-to-obtain metrics that . can be extracted from curriculum interaction data. Section three validates the ideas behind the metrics through it use Table 1: Example of Academic Records Student Id Course Id Semester Grade 200002608 ICM00604 2001-1S 6.75 200002608 FIEC04341 2001-2S 8.32 200225076 ICM00604 2002-1S 4.23 200300341 ICF00687 2003-2S 9.01 studies, that selection could provide useful information for curriculum analyzers. This work propose three metric asso- ciated with the temporal information of the academic record. 2.1.1 Course Temporal Position (CTP) This simple metric measure the average academic period (semester or year) in which a course is taken by the students of a program. This information can be used to establish the real position of a course in the program. To calculate this metric, the raw academic period infor- mation needs to be converted into a relative value. For ex- ample, in a semester based program, if a student started their studies during the first semester of 2004 and he or she took the relevant course during the second semester of 2006, the relative period will be 6, because the course was taken on the sixth semester relative to the student’s first semester. To avoid to inflate the metric, only active periods, that is pe- riods where the student has been actively pursuing the pro- Figure 1: Courses in the CS Curriculum gram, should be counted. Once relative period of a course is calculated for all the students that have approved the course N , the average is calculated according to Equation 1, where to perform illustrative curriculum analysis. The paper closes RPsc is the relative period of the analyzed course c for a with conclusions and recommendations for further work. given student s. Additionally, this metric can be configured to obtain the temporal position when a course was initially 2. CURRICULUM METRICS taken or when it was finally approved. Depending on the type of analysis, these two di↵erent metric versions could be Metrics are objective measurements or calculations of the useful. characteristics of an object that simplify their understanding and analysis. While obtaining the value for a given metric is N not the same as performing an analysis, metrics are the base 1 X of quantitative analytics. This work proposes curriculum CT Pc = RPsc (1) N s=1 interaction metrics than can be used to perform quantitative analysis of the status and efficiency of program curricula. When this metric is calculated for all the core courses These proposed metrics will be calculated exclusively from of the Computer Science case study program (Table 2), it curriculum interaction data (academic records). is clear that there are considerable di↵erences between the Academic records can be seen as the capture of the in- semester when the course is programmed and the average teractions between students and the program curriculum. semester when the students approve that course. The largest Table 1 present an example of the usual information present di↵erence correspond to Object-Oriented Programming. This in the records of an academic institution. As a minimum, course is programmed to be taken during the third semester, academic records contain information about two main inter- however, students, in average, are approving this course dur- actions events: 1) the decision of the student to register in a ing the sixth semester. On the other side of the spectrum, given course during a given academic period, and 2) the level Discrete Mathematics is programmed to be taken during the of success of the student within the chosen courses. Due to fourth semester, but students are approving it earlier (third these two di↵erent interaction aspects, the curriculum inter- semester). This information could be used to restructure action metrics will be grouped into two sets described in the the curriculum. subsections below. To obtain a first insight on the values generated by the 2.1.2 Temporal Distance between Courses (TDI) tool, they will be applied to a real Computer Science pro- This metric establishes how many academic periods, in av- gram in a medium-sized university. This program will serve erage, pass between a student taking two di↵erent courses. as a case study. The curricula of this program can be seen This information can be used to establish the actual se- in Figure 1 quence in which courses are taken. While a simple way to calculate this metric will be to 2.1 Temporal metrics subtract the CTP of the second course from the first course, In academic programs where students have the flexibility information about the actual time di↵erence for each stu- to select courses at di↵erent temporal points during their dent is lost due to the average nature of the CTP. To calcu- Table 2: Values of planned semester vs. CTP for all the core courses in the CS Program Course Planned Semester CTP OBJECT-ORIENTED PROGRAMMING 3 5.768965517 HARDWARE ARCHITECTURES 6 8.725 OPERATING SYSTEMS 8 10.51557093 PROGRAMING LANGUAGES 5 7.457478006 DIGITAL SYSTEMS I 5 7.303882195 ELECTRICAL NETWORKS 4 6.238329238 HUMAN-COMPUTER INTERACTION 8 10.19935691 SOFTWARE ENGINEERING II 8 9.97318612 SOFTWARE ENGINEERING I 7 8.920821114 ALGORITHM ANALYSIS 5 6.903743316 DIFERENCIAL EQUATIONS 3 4.868390129 DATABASE SYSTEMS I 6 7.845737483 ARTIFICIAL INTELLIGENCE 8 9.504983389 ORAL AND WRITTEN COMMUNICATION TECHNIQUES 1 2.498585573 MULTIVARIATE CALCULUS 2 3.471134021 GENERAL CHEMISTRY 1 2.294483294 PROGRAMMING FUNDAMENTALS 2 3.252823632 DATA STRUCTURES 4 4.946681175 STATISTICS 5 5.934782609 BASIC CALCULUS 1 1.846450617 BASIC PHYSICS 1 1.804273504 LINEAR ALGEBRA 2 2.791219512 COMPUTING AND SOCIETY 2 2.356042174 ECOLOGY AND EVIRONMETAL EDUCATION 4 4.025195482 ECONOMIC ENGINEERING I 7 6.876140808 DISCRETE MATHEMATICS 4 3.333333333 late TDI (Equation 2), the relative periods of the relevant courses (c1 and c2) are subtracted for each student. Then, N the average is taken. 1 X CDUc = (RP passcs RP f irstcs ) (3) N s=1 N 1 X When CDU is applied to the CS case study program, T DIc1,c2 = (RPsc2 RPsc1 ) (2) the values (Table 3) present some interesting results. Some N s=1 courses perceived as difficult, for example Basic Calculus, When applied to the CS case study program, it is now takes 2 semesters to be approved. However, other courses, apparent that courses that should be taken in sequence, are also considered difficult, for example Software Engineering, actually taken with 2 or more semesters apart. For example, are always passed during the first attempt. reviewing course position in Figure 1 and the values in Table 2, it is clear that subjects like Di↵erential Equations should 2.2 Difficulty metrics be taken immediately after Linear Algebra. In reality, they Each time a student undertakes a course, performance in- are taken, in average, two semesters apart. This information formation is captured and stored. The way in which this in- could be useful to better guide students in course selection. formation is represented varies, but usually involved a grad- ing scale. This scales could be categorical (letters, passing/not- passing etc.) or numerical (20 out of 100, 4 out of 5, etc.). The information stored in the student grades can be pro- 2.1.3 Course Duration (CDU) cessed to produce useful information about the difficulty of This metric measures the average number of academic pe- di↵erent courses in the program. This work summarizes riods that students need to pass a given course. This metric some simple difficulty metrics proposed by previous works provides information about the e↵ect that a course has in and propose two new profile-base metrics. the length of the program. CDU is obtained by subtracting the relative period of the 2.2.1 Simple Difficulty Metrics first time each student took the course (RP f irstcs ) from the The most basic metrics of the difficulty of a course are relative period when the student finally passed (RP passcs ) the passing rate (PR), the number of students that have it and then averaging these values between all the students approved the course divided by the number of students that (Equation 3). A variation of this metric only considers the have taking the course, and the average grade (AG), the periods where the course was taken. In this case, the metric sum of the grades of all students (converted to a numerical is identical to the average number of times that students value) divided by the number of students. These metrics, need to repeat the course before passing. however, are not comparable between courses because they Table 3: CDU values for all the core courses of the CS Program ordered from largest to smallest Course CDU BASIC CALCULUS 2.213775179 PROGRAMMING FUNDAMENTALS 1.873074101 STATISTICS 1.804930332 BASIC PHYSICS 1.743679775 DIFERENCIAL EQUATIONS 1.730544747 ELECTRICAL NETWORKS 1.586794462 LINEAR ALGEBRA 1.534738486 DATA STRUCTURES 1.439759036 GENERAL CHEMISTRY 1.438584316 MULTIVARIATE CALCULUS 1.426287744 PROGRAMING LANGUAGES 1.415881561 OBJECT-ORIENTED PROGRAMMING 1.285101822 DISCRETE MATHEMATICS 1.268479184 DIGITAL SYSTEMS I 1.263420724 DATABASE SYSTEMS I 1.247706422 ARTIFICIAL INTELLIGENCE 1.236245955 ALGORITHM ANALYSIS 1.230769231 COMPUTING AND SOCIETY 1.207446809 OPERATING SYSTEMS 1.205042017 ECOLOGY AND EVIRONMETAL EDUCATION 1.149152542 ORAL AND WRITTEN COMMUNICATION TECHNIQUES 1.097040606 ECONOMIC ENGINEERING I 1.093867334 HUMAN-COMPUTER INTERACTION 1.05229794 HARDWARE ARCHITECTURES 1.037356322 SOFTWARE ENGINEERING II 1.026479751 SOFTWARE ENGINEERING I 1.017492711 depend on the group of students that taken the course. A their performance (usually their GPA). For example, in a course with relatively good students will have a better PR program with grades between 0 and 10 and a passing grade and AG than a course when only regular or bad students. of 6, students could be grouped in with following schema: Calulkins et al. [2] proposed more robust difficulty met- students with [GPA higher than 8.5], [GPA of 7.5 to 8.5], rics. Two metrics, Grading Stringency, also called (Equa- [GPA of 6.5 to 7.5], [GPA of 5.5 to 6.5] and [GPA lower tion 4) and Multiplicative Magnitude, also called ↵ (Equa- than 5.5]. Then the relevant metric for a course is calculated tion 5) eliminate the bias introduced by the group of stu- separately for each group using only information from the dents taking the course by subtracting from the GPA of each performance of its members. student (GP As ) the grade that he or she obtained in the The use of profile for the difficulty metrics reduce the bias course (rsc ) and averaging those values over all the students of the PR and AG as it is calculated only for similar students (N ). However, the calculation of and ↵ metrics assume in di↵erent courses. Also, the profile-based metrics preserve a normal distribution of grades that is usually not the real the basic grade distribution shape for and ↵. case. The proposed profile-based difficulty metrics are: Nc • Course Approval Profile (CAP): This is the profile- 1 X c = (GP As rsc ) (4) based version of the Passing Rate (PR) metric. For Nc s=1 each student group, the number of students on that P Nc profile that have passed the course in a give period is GP As 2 divided by the number of students in the group that ↵c = PNc s=1 (5) s=1 (rsc ⇤ GP As ) have taken the course in the same period. These metrics were applied to the CS case study and were reported in a previous work [7]. • Course Performance Profile (CPP): This is the profile- based version of the Average Grade (AG) metric. For 2.2.2 Profile-Based Metrics each group of students that have taken the course, the Simple Difficulty metrics (PR, AG, and ↵) reduce the AG is calculated. difficulty of a course to a single number. However, as demon- strated by Mendez et al. [7], course difficulty is di↵erent for • Course Difficulty Profile (CDP): This is the profile- di↵erent types of students. To account for this di↵erence, based versions of the metrics proposed by Calulkins this work proposes a set of profile-based difficulty metrics. et al. It can be Additive (CDP- ) or Multiplicative The basic idea behind profile-based metrics is to divide (CDP-↵), depending on the difficulty metric used for the population of students in di↵erent groups according to each group. The result of the profile-based difficulty metrics is a vec- 3.2 Neglected Courses tor. This representation enables the use of more sophisti- It is common to find curricula with small sequences of cated data mining techniques to compare and group courses related courses. When those sequences are designed, it is according to their difficulty. expected that students follow the courses one after another All the difficulty metrics could also be calculated for each in consecutive periods. This is specially important for dif- Course-Instructor pair to provide a better difficulty estima- ficult courses such as Calculus or Physics where concepts tion given that the characteristics and grade stringency of learned in a previous course are necessary to master the each instructor could bias the metric result if averaged over concepts of the next one. However, students, specially in all instructors. flexible programs, could neglect taking some courses due When applied to the CS case study, the profiled metrics to di↵erent factors (difficulty, personal preferences, reduced are able to highlight di↵erent patterns among courses. For available time, etc.) If too much time pass between courses, example, in Figure 2, courses perceived as easy, such as Oral some of the previously learned concepts could be forgotten and Written Communication have a very similar Profile Ap- by the time the next course requires them, generating lower proval Rate for all but lowest performing students. On the than expected performance. other hand, difficult courses, such as Di↵erential Equations To find if there are courses that are consistently neglected and Programming Fundamentals, have a very steep decrease by students, the Temporal Distance between Courses (TDI) in Approval Rate for di↵erent type of students. Another ex- can be used. TDI is applied to each pair of consecutive ample can be sen in Figure 3, where the Profiled Difficulty courses in the analyzed sequence. If a pair of expected is represented. For courses perceived as easy, such as Eco- consecutive courses have a TDI value higher than a thresh- nomic Engineering, improve the GPA of all but the lowest old (for example 1.5) the second course could be consid- performing students. Difficult courses, however, negatively ered neglected and actions should be taken to encourage the a↵ect the GPA of all students in a degree related to their students to take them as originally planned (for example, actual GPA, as is the case for Programming Fundamentals. adding the second course as a prerequisite to a course with TDI between 2 and 2.5 from the first course). 3. CURRICULUM ANALYSIS 3.3 Bottlenecks Identification The main purpose of calculating a set of well understood Due to economic constraints, the time that a student takes metrics over the di↵erent courses of a program curriculum in completing the program has been of great interest for aca- is able to easily find answers through more complex analysis demic institutions. However, it is not always clear which based on a combination of the metrics’ results. This section courses are the bottlenecks that reduce the overall through- provides five illustrative examples of these analysis using put of the program. only the temporal and difficulty metrics presented before. One way to identify the o↵ending courses is to convert the curriculum into a graph. Each course will be a node in this 3.1 Course Concurrency graph. A edge will connect each pair of courses. The weight One of the main tasks in curriculum analysis is to deter- of each edge will be equal to the TDI between the courses it mine the workload that a student will receive over a given connects. All the edges with weights lower than 1 and higher academic period. It is a usual practice that instructors from than 2 are removed to leave only courses taken in sequence. concurrent courses, that is, courses that are taken together Then the course with lowest CTP is selected as the initial in a period, interchange information about their course load node and the critical path is found in the graph. The critical (homework, projects, etc.) to avoid to overload the students path determines the longest sequential path from the initial over specific times during the period, for example near the course. For each of the nodes in the critical path, the course exams). However, it is not always easy to determine which duration (CDU) is calculated. Those courses in the critical courses are actually concurrent, specially if the program if path with the higher CDU could be flagged as bottlenecks flexible. because they are likely to increase the number of periods This analysis can be performed mainly in two ways. With- that a student has to stay in the program. out previously calculated metrics, the recommended tech- nique is to use a frequent itemset mining technique, such as 3.4 Section Planning FP-Growth [4]. This technique discover courses commonly Physical or regulatory limitations often determine the max- taken together more than a given percentage of times (sup- imum numbers of students in a given class. When there are port). However, it is not easy for instructors to determine more students than places in a class, it is common prac- the right value of the support and the crisp sets that this tice to create additional sections of the course taught either algorithm return hide information about less frequent but by the same or a di↵erent instructor. Planning the number also occurring course concurrences. of sections needed for the next period, before the end of the In the second method, the determination of concurrency current period is sometimes a challenge and usually provides between courses can be easily obtained from the Course unreliable results. This leads to wasting of resources (for ex- Temporal Position (CTP) metric. For example, in a semester- ample, two half-full sections) or under-served students (for based program, all courses with at CTP between 1 and 1.5 example, students that can not follow the course during the could be considered to be part of the first semester, while period due to full sections). all the courses with a CTP between 1.5 and 2.5 could be The average passing rate it the usual way in which the considered to be in the second semester. Moreover, overlap- forecast about the number of students that will be available ping sets could be used to assure that less frequent, but also to take the next courses is calculated. However, given that relevant concurrences are taken into account in the period each period the composition of students varies, the pass- workload discussions. ing rate does not remain constant, leading to inaccurate re- Figure 2: Profiled Approval Rate for di↵erent courses in the CS program sults. The use of the profile-base approval metric (CAP) tered with similar courses. Presenting this information for could provide a better way to forecast the actual number all courses in the program could help instructors to associate of students that will pass the course because it takes into the difficulty of known courses to new or unknown courses. account the di↵erent performance of the students taking the This potentially could lead to a better recommendation to course. These CAP could be refined by using a combination the student. of Course-Instructor to also take into account the grading stringency of the instructor. 4. CONCLUSIONS AND FURTHER WORK 3.5 Course Similarity Di↵erently from data produced at course-level, program- One of the main curricular decisions that students make level data tend to be more homogeneous between programs is the selection of the course load for each period. The num- and institutions. This similarity could lead to the develop- ber and difficulty of the courses has a been found to have ment of a sub-field of Learning Analytics with a common set direct impact on the performance of the students [7]. This of metrics and methodologies for Program Curriculum anal- decision is so important that it is common for academic in- ysis that could be called Curricular Analytics. This work is stitutions to provide course-selection counseling for students one of the first steps towards the creating this sub-field. that seems to be struggling with their workload. The coun- Even simple metrics, when well defined and transferable seling session, however, only transfer the burden of course between programs, have the capacity to improve the way in selection to instructors or professors that do not necessarily which curricula are analyzed and improved. The list of met- have a current picture of the difficulty and load of all the rics presented in this work is by no means comprehensive, courses in the program. The decision is taken with a better but provides a starting point from which more advanced and background knowledge, but still perceptions and beliefs are informative metrics could be created. the main sources of information. The presented illustrative analysis served as an initial val- The vector nature of the profile-based difficulty metrics idation of the feasibility and usefulness of the metrics. How- could be exploited to apply straight-forward clustering tech- ever, a series of evaluation studies with real data from exist- niques to group the courses according to their type of diffi- ing programs is needed before these metrics could be safely culty. These groupings could provide an easier way to char- used by practitioners to draw conclusions from their pro- acterize courses. For example, courses with the same pass- grams. The operational complexity of these studies is very ing rate AG, could be grouped separately according to their low given that only the raw data and simple computational Difficulty profile (CDP). Difficult courses, with a linearly tools (for example a spreadsheet) are needed to obtain the decreasing negative for students with lower GPAs, will be metrics. On the other hand, measuring the informational clustered together. The same will happen to easy courses value of the metrics to solve real-world questions requires a that have a constant value among the groups. Courses more complex quantitative and qualitative analysis. with other distributions (for example, very easy for good The relative homogeneity of the data could also lead to the performers, but hard for bad performers) will also be clus- creation of Curricular Analytics tools or plugins that could Figure 3: Profiled Difficulty for di↵erent courses in the CS program incorporate all the tested metrics and analysis developed assist curriculum designers. Journal of Engineering inside this sub-field. The existence of this kind of easy-to- Education, 88(1):43–51, 1999. use tools could help in transferring the research results into [6] H. Lempp, C. Seale, et al. The hidden curriculum in the practitioners field much faster than what has happened undergraduate medical education: qualitative study of in Learning Analytics in general, where research results are medical students’ perceptions of teaching. BMJ, much harder to make inroad in the day-to-day operation of 329(7469):770–773, 2004. academic institutions. [7] G. Mendez, X. Ochoa, K. Chiluiza, and B. de Wever. Finally, this work is a call to other Learning Analytics Curricular design analysis: A data-driven perspective. researchers to start focusing on the di↵erent levels of learn- Journal of Learning Analytics, 1(3):84–119, 2014. ing and education and the interrelation between those lev- [8] M. Pechenizkiy, N. Trcka, P. De Bra, and P. Toledo. els. While the focus on course-level analytics could help Currim: curriculum mining. In Educational Data to improve the learning process in the classroom, only a Mining 2012, 2012. holistic approach could ensure that these improvements are [9] M. Sahami, S. Roach, E. Cuadros-Vargas, and also reflected in the efficiency and e↵ectiveness of learning D. Reed. Computer science curriculum 2013: programs and that society will receive the benefit of better reviewing the strawman report from the acm/ieee-cs prepared individuals. task force. In Proceedings of the 43rd ACM technical symposium on Computer Science Education, pages 5. REFERENCES 3–4. ACM, 2012. [10] W. Sugar, B. Hoard, A. Brown, and L. Daniels. [1] L. Y. Bendatu and B. N. Yahya. Sequence matching Identifying multimedia production competencies and analysis for curriculum development. Jurnal Teknik skills of instructional design and technology Industri, 17(1):47–52, 2015. professionals: An analysis of recent job postings. [2] J. Caulkins, P. Larkey, and J. Wei. Adjusting GPA to Journal of Educational Technology Systems, Reflect Course Difficulty. H. John Heinz III School of 40(3):227–249, 2012. Public Policy and Management, 1996. [3] R. Davis, S. Misra, and S. Van Auken. A gap analysis approach to marketing curriculum assessment: A study of skills and knowledge. Journal of Marketing Education, 24(3):218–224, 2002. [4] J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate generation. In ACM SIGMOD Record, volume 29, pages 1–12. ACM, 2000. [5] J. D. Lang, S. Cruse, F. D. McVey, and J. McMasters. Industry expectations of new engineers: A survey to