-

Rankings of Students Based on Experts' Assessment of Levels of Verification of Learning Outcomes by Test Items

Aleksandra Mreła

Oleksandr Sokołov

1 0 Kujawy and Pomorze University in Bydgoszcz, Technical Department , Toruńska str. 55-57, Bydgoszcz 85-023 , Poland 1 Nicolaus Copernicus University in Toruń, Faculty of Physics , Astronomy and Informatics, Grudziądzka str. 5, 87-100 Toruń , Poland

The paper presents methods of preparing students' rankings based on the results of final secondary school examination test in mathematics in Poland. The currently used method is based on the percentage of earned points and does not take into account levels of acquirement of learning outcomes by students. The data used in this article contains results of students who earned the same number of points. Presented methods of preparing rankings are based on the experts' assessment of levels of verification of learning outcomes by items and methods of fuzzification of this assessment. According to the applied method the rankings show some difference.

Ranking of students Fuzzification Quality assurance

Development of information technologies has contributed significantly to the teaching methods and students’ assessment. Tutorial programs, interactive tests, different monitoring studies and state programs for automated evaluation of knowledge and skills have been introduced recently.

Testing has been applied widely in distance education and during implementation of the Bologna Process for student’s self-education. Automated testing application has been expanded to the manufacturing, where personnel management is transformed into a continuous process of training (of course, with the subsequent testing and assessment of trainees). The distinct feature of such systems is that the role of a teacher in the process of learning and assessment is much narrower, and the results have been evaluated automatically which has been caused by the requirement for simultaneous estimation of a large number of trainees, and by the ideology of automated learning itself – self-consistent learning and independent evaluation. One of the major tasks is the comparability of the results of different tests, ranging of students level of knowledge, formation of the final scoring for the test sets. Use of so-called "raw" scores, i.e., totals for the successful implementation of items resulting from the test might be applied to the very limited extent (if testing is limited to the identifying of the level of knowledge on particular topic and cannot be integrated with other results). Effectiveness of the test score depends not only on the quality of the test, but also on the methods of comparison and interpretation of primary (raw) score of test group.

Therefore one can assume as important to analyze the existing methods of comparison and integration of scores of various tests, study the quality of the students’ group assessment, understanding the diversity of evaluation points as a quality criterion for estimating methods.

In 1999 the Polish Government decided to take part in the Bologna Process [ 1 ] and because of that all Polish educational institution at the beginning of the process of designing curricula define learning outcomes which students have to acquire during their studies and their teachers have to verify their acquirement by the students.

The mathematics curriculum in Polish secondary schools was approved by the Polish Ministry of Education [ 2 ]. This curriculum considers 5 learning outcomes which students are taught during three-year studies. At the end of the studies students take the written final secondary school examination.

The examination is dived into two parts, the first one is consisted of 25 items of multiple choice for which students can earn 1 point for answering correctly and the second part is consisted of 9 items for solving which students can earn more than 1 point. In this paper the data, we use to discuss the methods of preparing rankings, refers only to results of the first part of this examination.

The data comprised with results of the final secondary school examination in mathematics in one of Polish secondary schools in 2015. This examination was written by 149 students. For solving the first part of this examination students can earn up to 25 points. This paper presents discussion on the group of 18 students who earned the number of points equal to 18, so they all achieved the same position in the ranking of students based on the number of earned points or the average mean.

Sometimes we might encounter situations when there is a need to distinguish between these students, for example we would like to choose 5 out of these 18 students. So this ranking does not help us choose 5 better students unless we decide to apply more criteria.

In this paper we will discuss methods of preparing rankings [ 6,7 ] of students taking into consideration the results of the test in mathematics and experts’ assessment of levels of verification of learning outcomes by test items and we discuss the results of 18 students who answering the first 25 items of the final secondary school examination in mathematics earned total score 18 points in one of Polish high schools.

In order to calculate levels of acquiring learning outcomes by students we will use the theory of fuzzy sets which was introduced by L.A. Zadeh [12] in 1965. In 1975 he generalized the concept of type 1 fuzzy sets and introduced type 2 fuzzy sets [13]. 2

Crisp Relations

To build the relations between learning outcomes and items we will use the description of four leaning outcomes LO1 – LO4 written in [ 2 ]: ˗ LO1 – Student interprets mathematical texts. After solving the tasks, student interprets the achieved result. ˗ LO2 – Student uses simple, well known mathematical objects. ˗ LO3 – Student chooses a mathematical object to the simple situation and estimates the pertinence of model critically. ˗ LO4 – Student applies strategy which results clearly from the content of the task.

On the basis of principles of assessment published in [ 3 ], where for each learning outcome the experts indicate items prepared for verifying its acquirement by students, the crisp relation R1 between learning outcomes and items was prepared, so the value of this relation R1( j, k ) for learning outcome j, where j  1,...,4, and item k , where k  1,...,25 , is equal to 1 if the experts decided that item k can verify the acquirement of learning outcome j, or it is equal to 0, otherwise. Note that LOi  LOj  0 for each i  j and each item.

The value R2 (k , i) of relation R2 between item k , where k  1,...,25, and student i, where i  1,...,149, is equal to 1 if the student answered this item correctly or it is equal to 0, otherwise.

To calculate values of relation R3 between learning outcomes and students S-Tcomposition is used.

Let us recall the definition of S-T composition [11]. Let R  (x, y),  R (x, y) and P  ( y, z),  P ( y, z) be two relations with the membership functions  R and  P . The S-T composition of these relations R  X Y and P  Y  Z is a relation R  P  X  Z with the membership function defined as follows:

 RP (x, z)  S yY (T ( R (x, y),  P ( y, z))) .

In [9] there is shown that for educational purposes the algebraic T-norm and S-norm are better than the most popular T-norm minimum and S-norm maximum, so we will use S-T composition with algebraic T-norm and S-norm. In this case all these students acquired learning outcomes LO1 – LO4 on the level 1 except student S10 whose level of acquirement of leaning outcome LO4 was 0. Thus we cannot distinguish between these students with the exception of S10 and we cannot prepare the ranking. 3

Item Response Theory

According to the Item Response Theory (IRT) mathematical ability (understanding mathematics and skills in solving tasks) is a latent trait which cannot be measured. This theory describes the method of measuring this ability on the basis of results of the test which was solved by the group of students [ 4, 8, 10 ].

According to the algorithm [10], the mathematical abilities described in learning outcomes LO1-LO4 (all together) and then in these learning outcomes taken into account separately were calculated and put to Table 2. S1, S3, S5, S7, S11, S12, S17 S4, S6, S8, S10, S13, S14, S15, S16, S18 S2, S9

LO1 S1, S2, S6, S9, S10, S15, S17, S18 S4, S5, S8, S11, S14, S16 S3, S7, S12, S13 S2, S4, S5, S8, S9, S14, S15, S17 S1, S3, S6, S7, S11, S12, S13, S16, S18

The main problem with levels of mathematical abilities is too few values (too many students achieved the same level of this ability) in order to prepare rankings and differentiate students. For example, when we take into account all learning outcomes (LO1LO4) there are only 3 positions in the ranking, on the first position there are 7 students (S1, S3, S5, S7, S11, S12, S17), on the second one there are 9 students (S4, S6, S8, S10, S13, S14, S15, S16, S18) and on the third one there are 2 students (S2, S9).

The similar situation we encounter if we take as the basis of the ranking learning outcomes LO1, LO3 and LO4 (there are only 3 positions) and only for learning outcome LO2 there are 6 positions.

If we have more requirements according to the ranking, e.g. assume that learning outcome LO1 is the most important, then LO2, LO3 and the least important learning outcome is LO4, then we can distinguish students and using the lexicographical order we can prepare the ranking of the students presented in Table 4. .

Now we can differentiate students and prepare the ranking. Remembering that the most important learning outcome was LO1, the students who were on the first position in the ranking based on this learning outcome are on the first 5 positions, so we differentiated them. It is interesting that student S10 who did not acquire learning outcome LO4 takes the first position in the ranking. Students who took the third position in the ranking based on learning outcome LO1 take positions 11-13. 4

First Type Fuzzification

Now, we will fuzzify the relation between learning outcomes and items by letting the experts who define levels of verifying learning outcomes by items to use values from the interval [ 0,1 ]. The membership functions of the relation between the given learning outcome and items are presented in Fig. 2.

Now using the crisp relation between students and items and type 1 fuzzy relation between learning outcomes and items we can calculate, using the S-T composition, type 1 fuzzy relation between learning outcomes and students which values denote levels of acquirement learning outcomes by students. The values of this relationship are presented in Table 5.

Students Now we can prepare rankings on the basis of levels of acquirement of each learning outcome LO1 – LO4 (separately) by the students. However, since they should acquire all learning outcomes, we prepare the ranking based on all of them using the lexicographical order. Assume that learning outcome LO1 is the most important, then LO2, LO3 and finally LO4. After using this information we can prepare ranking of students in Table 6.

Student

S2 S17 S9, S15 S1 S10 S18 S6 S8 S5 Position 10 11 12 13 14 15 16 17 18

Student S14 S16 S4

S11 S3 S12 S7 S13 – At first we can notice that the students take different positions in the ranking, only two students S9 and S15 have got the same position in the ranking. Now the best student is S2 and the poorest student is S13.

This ranking shows that student S2 is the best one when learning outcome LO1 is the most important one. Of course, if we choose the different order of importance of learning outcomes, the ranking will be different. The possibility of preparing different rankings according to specific criteria is really important for recruitment officers because they need candidates with specific abilities and skills.

Comparing the fuzzification and the IRT method we can see that the IRT takes into consideration only the difficulties of items and the examinee’s abilities. Our method enables to calculate the levels of learning outcomes’ acquirement taking into consideration one, a few or all learning outcomes. 5

Second Type Fuzzification

Now, we fuzzify further the relation between learning outcomes and items by letting the experts who defined levels of verifying learning outcomes define their own value belonging to the interval [ 0,1 ]. Hence the sample membership functions of the relation between the given learning outcome and items are presented in Fig. 3.

In order to prepare another ranking of students, we will use type 2 fuzzy relations [11,13].

Let A  {0.5,0.49,..., ,0,...,1.6} be the basic membership for all secondary membership functions for j  1,2,3,4 , i  1,2,...,149 and k  1,2,...,25 . Let m j,k and s j,k denote the average mean and standard deviation of values set by experts for learning outcome j and item k. Let each secondary membership function of the relation between learning outcome and item be defined the Gauss function 1( x, j, k )  exp( ( x  m j,k )2 / s j,k ) for each j , k and x  A.

Since students can earn 0 or 1 points, so the secondary membership functions of the type 2 fuzzy relation between items and students can be the Gauss functions defined as follows:  2 ( x, i, k )  exp( ( x  mi,k )2 / si,k ) for each i , k and x  A , where mi,k  0,1 and si,k  0.1.

Now let the S-T composition between relations R1 and R2 be defined as follows: R3(x, j,i)  1  (1  R1(x, j,1)  R2 (x,1,i))  ...   (1  R1(x, j,25)  R2 (x,25,i)) for each j, i and x . After the S-T composition, the sample secondary membership functions of the type 2 fuzzy relation between learning outcomes and students are presented in Fig. 4.

Student S1 Student S2 Fig. 4. The membership functions of the relation type 2 fuzzy relation between learning outcomes LO1-LO4 and students.

The next step is to find the method of comparing the results of students on the basis of the calculated secondary membership functions of the relation between learning outcomes and students. Let the level of acquirement of learning outcome j by the student i be equal to first coordinate x j,i of the maximal point of the secondary membership function. Moreover, as we can see (Figure 3) some of the functions are “slim” and some are “wide”. Hence we can assume that if the function is “slim”, so the likelihood of the result is higher and when the function is “wide”, so the likelihood is smaller.

Thus the likelihood of the level of acquiring learning outcome j by student i , called the range, is defined as follows:

range( j, i)  A(a2 )  A(a1) where

a1  minxA 3 (a1, j,i)  0.5 and a2  maxxA 3 (a1, j,i)  0.5. Thus for student i and learning outcome j we get the pair ( x j,i , range( j, i)). Hence we got the set of values of acquiring learning outcomes with their ranges. The pairs of these values for learning outcomes LO1 and LO2 and all students are put to Table 8. To prepare the ranking we have to defuzzify the achieved results [ 11,4 ]. We assume that the first value should be greater and for students who achieved the same first values, the second one should be smaller. Thus on this basis we can prepare the ranking of students according to each learning outcome but as in the previous sections, we will present the ranking based on the lexicographic order assuming that learning outcome LO1 is most important, then LO2, LO3 and LO4. Hence we achieve the following ranking of students presented in Table 9. Comparing the rankings presented in Tables 6 and 9 we can notice that the first and last positions are the same and position of other students are similar. However, using the type 2 fuzzy relations we have more information because we can also describe the likelihood of acquirement of the learning outcomes by given students.

Even if this ranking did not differentiate students S6 and S18, it is not worse than the previous rankings. Moreover, we have got more information about likelihood of acquirement of learning outcomes. 6

Results

Nowadays rankings of students are prepared very often. For example, in Poland all universities to admit students for the first year course prepare the ranking of candidates based on the results of the final secondary school examination and the grades of specific subjects. Universities could choose students based on the levels of acquirement of learning outcomes, not only on the average mean. All students discussed in the paper are on the same position after the first part of the examinations.

The paper presents different methods of preparing rankings based on the IRT algorithm and three different level of fuzzification of the relation between learning outcomes and items: crisp (two different position in the ranking), type 1 (we can prepare the ranking using additionally the lexicographical order of levels of acquirement of learning outcomes by students) and type 2 (the ranking based on the lexicographical order of levels of acquirement of learning outcomes by students gives additionally the information about likelihood of this acquirement).

The IRT method takes into account only difficulties of items and examinee’s abilities described as learning outcomes. Moreover, when we prepared the ranking based on examinee’s abilities of all discussed learning outcomes we have got only 3 positions.

The ranking prepared on the basis of the crisp relation between learning outcomes and items had only 2 positions. After the first and second fuzzifications, we have got the rankings with 17 positions and in the case of type 2 fuzzy relations we have got more information (the likelihood of levels of learning outcomes acquirement).

The next step is to find more precise measure than range to calculate the likelihood of acquirement learning outcomes by students, for example the area between x-axis and Gauss functions. 9. Mreła A., Sokołov O., Katafiasz T.: Types of fuzzy relations’ composition applied to validation of learning outcomes at mathematics during final high school examination, in: Mreła A., Wilkoszewski P. (ed.): Nauka i technika u progu III tysiąclecia, Wydawnictwo Kujawsko-Pomorskiej Szkoły Wyższej w Bydgoszczy, Bydgoszcz, pp.119-132 (2015). 10. Нейман Ю.М, Хлебников В.А.: Введение в теорию моделирования и параметризации педагогических тестов, М., Прометей (2000). 11. Rutkowski R.: Metody i techniki sztucznej inteligencji, PWN, Warsaw (2009). 12. Zadeh L.A.: Fuzzy sets, Information and Control 8, pp.338-353 (1965). 13. Zadeh L.A.: The Concept of a Linguistic Variable and Its Application to Approximate Reasoning–1, Information Sciences, vol. 8, pp. 199–249 (1975).

European

Higher Educational Area , Homepage, www.ehea.info, last accessed 2016 /07/12.

Rozporządzenie

Ministra Edukacji Narodowej z dnia 23 grudnia 2008 r. w sprawie podstawy programowej wychowania przedszkolnego oraz kształcenia ogólnego w poszczególnych typach szkół .

3. Zasady oceniania rozwiązań zadań, Egzamin maturalny w roku 2014 / 2015 , Centralna Komisja Egzaminacyjna, www.cke.edu.pl, last accessed 2016 /07/12.

F.B.

Baker , F.B. : The basics of Item Response Theory, ERIC Clearinghouse on Assessment and Evaluation USA ( 2001 ).

5. Dobrosielski

W.T.

, Szczepański

, Zarzycki

: A Proposal for a Method of Defuzzification based on the Golden Ratio - GR, CONFERENCE IWIFSGN, Cracow 2015 , Novel Developments in Uncertainty Representation and Processing, pp. 75 - 84 , Springer International Publishing ( 2016 ).

6. Duch

, Wieczorek

, Biesiada

, Blachnik

: Comparison of feature ranking methods based on information entropy , Proc. of Int. Joint Conf. on Neural Networks (IJCNN) , Budapest 2004 , IEEE Press, pp. 1415 - 142 0 ( 2004 ).

7. Duch

, Winiarski

, Biesiada

, Kachel , A. : Feature Ranking, Selection and Discretization , Proc. Joint Int. Conf. on Artificial Neural Networks (ICANN) and Int. Conf. on Neural Information Processing (ICONIP) , Istanbul, pp. 251 - 254 ( 2003 ).

8. Hambleton

R.K.

, Swaminathan

: Item Response Theory, Principles and applications, Springer Science+Business Media, LLC ( 1991 ).