Reliability of Decision in Testing Problems Mikhail M. Lutsenko Dzhemil A. Seytmanbitov Anatoly M. Baranovskiy Emperor Alexander I St. Emperor Alexander I St. Emperor Alexander I St. Petersburg State Transport Petersburg State Transport Petersburg State Transport University University University Saint Petersburg, Russia Saint Petersburg, Russia Saint Petersburg, Russia ml4116@mail.ru dzhem93@gmail.com bamvka@mail.ru the test results. Examples of such solutions are the following sets of solutions. Accurate assessment of the test-taker knowledge level: Abstract In this paper, a statistical game was defined and solved. Its solution is: the optimal 𝑑% = ΞΈ% , 𝑑' = ΞΈ' , …, 𝑑1 = ΞΈ1 , randomized decision rule, the probability of a correct decision on this rule, and the worst a (The solution 𝑑2 is that the test-taker knowledge level is priori distribution of the test subjects ΞΈ2 ). knowledge levels. We have developed a Interval assessment of the test-taker knowledge level: method for assessment the accuracy and reliability of decision making by on test results. 𝑑% = βˆ†% , 𝑑' = βˆ†' , …, 𝑑) = βˆ†) , The proposed program allows you to assessment the reliability of the solution for a where βˆ†% , βˆ†' , … , βˆ†) βŠ† Θ is a set of partially intersecting test containing 10 items with different levels of intervals. Their elements can be interpreted as poorly difficulty, and 11 different levels of knowledge prepared test-takers, satisfactorily prepared, etc., or as test- level. takers ready to execute task 1, task 2, etc. (Solution 𝑑2 is that the interval βˆ†2 include the test-taker knowledge level ΞΈ). Introduction Let’s β„Ž(𝑑, ΞΈ) the benefit of the decision maker when it The main purpose of any testing is to assessment of test- made the decision 𝑑, and the knowledge level was ΞΈ. takers knowledge level and make a decision by the Using the benefit function, you can modeling the many result. Unfortunately, the test result (the number of other sets of solutions. completed test items) depends not only on the test-takers For example, the decision maker benefit function knowledge levels, but also on many other factors that are hardly predicted. So, an adequate model of the decision 1, if ΞΈ ∈ Ξ”(𝑑) β„Ž(𝑑, ΞΈ) = 7 making problem must include probabilistic components. 0, otherwise We need a flexible model for building optimal is built on any set of confidence intervals βˆ†% , βˆ†' , … , βˆ†) βŠ† randomized solutions that takes into account various Θ [Lut03]. types of solutions, a priori distributions at different Let’s 𝑋 = {1,2, … , 𝑁} the set of possible values of a knowledge levels, and different item difficulties. This random variable 𝑋(ΞΈ), and, 𝑃H (π‘₯) = 𝑃(𝑋(ΞΈ) = π‘₯) the model can be executed in the scope of statistical game probability of the corresponding event. These probabilities theory [Lin97]. can be calculated as probability for Bernoulli trials𝑃H (π‘₯) = 𝐢KL 𝑝 L (1 βˆ’ 𝑝)KOL , if the difficulties of all 1 Methods and Algorithms test items are the same. In the case when the difficulties of Let’s Θ = {ΞΈ% , ΞΈ' , … , ΞΈ) } be the set of possible test- the items are different, we will define a random variable takers knowledge levels, 𝑋(ΞΈ) – random variable (the 𝑋P (ΞΈ). Its value is set to one if the test-taker with the number of points tested with a knowledge level ΞΈ when knowledge level ΞΈ completed the j-th item of the test and take test 𝑇). Let’s 𝐷 = {𝑑% , 𝑑' , … , 𝑑1 } – be the set of zero otherwise. Let’s p(π‘˜, ΞΈ) = 𝑃(𝑋P (ΞΈ) = 1) the possible decisions that the decision maker can make by probability of the corresponding event. Then the random variable 𝑋(ΞΈ) is equal to the sum of the corresponding Copyright c by the paper's authors. Use permitted under Creative random variables: Commons License Attribution 4.0 International (CC BY 4.0). In: A. Khomonenko, B. Sokolov, K. Ivanova (eds.): Selected Papers of the Models and Methods of Information Systems Research Workshop, St. 𝑋(ΞΈ) = 𝑋% (ΞΈ) + 𝑋' (ΞΈ) + β‹― + 𝑋K (ΞΈ) Petersburg, Russia, 4-5 Dec. 2019, published at http://ceur-ws.org 59 K In this case, the probability 𝑃H (π‘₯) is calculated using the known formulas [Ney00]. Ξ½ = Y uP β†’ min, Let’s Ξ΄ is a function that gives each observed point π‘₯ a PZ% ) solution from the set 𝐷, that is Ξ΄: 𝑋 β†’ 𝐷. We denote the set of all solving functions by 𝐷 = 𝐷 W . Ξ½ Ξ› 𝐡 ≀ uP 1q1 ; q P mmmmm π‘˜ = 1, 𝑁; Y Ξ½s = 1. sZ% There are many ways to solve linear programming problems. The most appropriate method here would be the Let’s dynamic method [Lut90], specially developed by the K author for statistical games with threshold benefit 𝐻(Ξ΄, ΞΈ) = Y 𝑃H (π‘₯P )β„Ž(ΞΈ, Ξ΄(π‘₯P )) functions. However, in the simplest cases, the statistical (1) game can be solved using MS Excel. Although these PZ% methods often do not provide an exact solution, they is the expected value of the decision maker benefit if it always indicate valid solutions to problems and, uses the solving function Ξ΄, and the test-takers knowledge consequently, the upper and lower bounds of the matrix level is ΞΈ. If the function β„Ž(𝑑, ΞΈ) is defined through a set game. of confidence intervals, then the function 𝐻(Ξ΄, ΞΈ) is equal to the probability that the accurate value of the test-taker knowledge level is in the required confidence interval. 2 Approbation This interval is built on the observed point π‘₯ according to Let's assume that the test includes 10 questions, and the the solving function Ξ΄. Statistician makes a decision by the results of this test. Note that the lowest probability that a set of The set of observations 𝑋 includes 11 numbers: from zero βˆ†% , βˆ†' , … , βˆ†) , generated by the solving function Ξ΄, will to 10. The probability of a correct answer to one test include the unknown parameter ΞΈ, is called the question is equal to the test-taker knowledge level. The confidence probability for this set (for this solving possible values of test-taker knowledge level are set Θ = function), that is {0,95; 0,85; 0,75; 0,65; 0,55; 0,45; 0,35; 0,25; 0,15; 0,05}. Ξ³ = Ξ³(Ξ΄) = min 𝑃(ΞΈ ∈ βˆ†(Ξ΄(𝑋H ))). Then the probability of correctly answering x test items is H∈_ calculated as probability for Bernoulli trials: If the a priori distribution of knowledge levels Ξ½ is known, then the best solving function Ξ΄a can be builded L 𝑃H (π‘₯) = 𝐢%{ βˆ™ ΞΈL βˆ™ (1 βˆ’ ΞΈ)%{OL , π‘₯ = mmmmmm 0; 10. according to this distribution Statistics assess the knowledge level in the subject. It puts 𝐻(Ξ΄a , Ξ½) = max 𝐻(Ξ΄, Ξ½), one of the following four grades: D = {A, B, C, D}. An d excellent grade is given to test-takers with 95% and 85% this function is called the Bayesian solving function knowledge, good from 75% to 55%, satisfactory from [Lut00]. 45% to 35%, and unsatisfactory to the rest. If the a priori distribution is unknown, then the best Let's make a statistical game Π“ = 〈𝐷, Θ, 𝐻βŒͺ and solve it in solving function should be found from the solution of the mixed strategies. The benefit matrix in the game has a size statistical game Ξ“ = 〈𝐷, Θ, 𝐻βŒͺ. Where 𝐷 – is the set of of 44 Γ— 10. Unfortunately, MS Excel tools do not allow solving functions (the set of decision maker strategies), Θ you to accurately solve two mutually dual problems. But – is the set of possible test-taker knowledge levels (the we get upper and lower assessment of the game value, a set of condition of nature), and the decision maker benefit randomized decision function, and the worst a priori function in a statistical game whose values are found by distribution of the parameter ΞΈ [Lut11]. the formula (1). As a result, we get the lower (0.519) and upper (0.562) To solve the matrix game, let's make a pair of mutually assessments of the game value. dual problems. From the first problem, we find: the best randomized solving function Β΅ = (Β΅% , Β΅' , … , Β΅K ), from Table 1: Randomized decision function Β΅. the second, the worst a priori distribution Ξ½, and the total Β΅%{ Β΅β€’ ¡€ Β΅β€’ Β΅β€š Β΅Ζ’ Β΅β€ž ¡… Β΅' Β΅% Β΅{ value of these games is the value of the game Π“. 1,00 0,49 0,75 0 0 0 0 0 0 0 0 A Direct problem: Ξ½ β†’ max, 0,51 0,24 0,95 0,75 0,70 K 0 0 0 0 0 0 B decision Y Ξ›P 𝐡¡P β‰₯ Ξ½1) PZ% 0,05 0,25 0,30 1,00 1,00 1 C 0 0 0 0 0 0 2 Y Β΅P = 1; 2 Β΅P β‰₯ 0; mmmmm π‘˜ = 1, 𝑁; 𝑗 = mmmmm 1, 𝑛. 1,00 1,00 1,00 2Z% D 0 0 0 0 0 0 0 0 Dual problem: The columns in this table indicate the probabilities with 60 which the Statistician indicates a particular solution 𝑝s,2 = (1 + 𝑒π‘₯𝑝 (ΞΈs βˆ’ Ο„2 ))O% , Ο„2 ∈ ℝ, depending on the observation. So, the probability of correct decision of the statistics Let’s 𝑐2 – is the number of participants who correctly about the test-taker knowledge level by the test results is performed the item with the number 𝑗 (the number of in the range from 0.52 to 0.56. Thus, in about 50% of initial points j-th item); 𝑏s – is the number of correctly cases, the Statistician will make an incorrect decision completed items participant number 𝑖. (As a rule, these about the test-taker knowledge level [Sha13]. are all integers from 0 to N inclusive). Assessment ΞΈβ€œ{ , ΞΈβ€œ% , . . . , ΞΈβ€œK ; τ”{ , τ”% , . . . , τ”K of the corresponding Table 2: Worst a priori distribution of the parameter ΞΈ. parameters can be obtained by the method of moments or by the method of greatest likelihood. To do this, need to solve a system of equations. 0,95 0,85 0,75 0,65 0,55 0,45 0,35 0,25 0,15 0,05 ΞΈs K ⎧Y 𝑝 = 𝑏 , 𝑖 = mmmmm 1, 𝑛; 0,11 0,01 0,04 0,25 0,19 0,15 0,26 Ξ½s βŽͺ s,2 s 0 0 0 2Z% 1 (2) ⎨ The resulting game values are a lower assessment and mmmmm βŽͺY 𝑝s,2 = 𝑐2 , 𝑗 = 1, 𝑁. can be improved with a known a priori distribution. In ⎩ sZ% addition, it seems unlikely that the a priori distribution of knowledge levels coincides with the worst a priori The possible values of the right parts of this system distribution [Sha14]. (numbers 𝑏s ) are integers from 0 to N. So system (2) Although the above statement does not take into account consists of 2N+1 equations and contains 2N+1 unknowns. all the features of the testing organization, it can be clarified if necessary. However, the value of the game Conclusion will not improve much if you enter more items into the In this paper, the problem of calculating the reliability of test. Similar examples are considered in [Lut14]. decisions made based on the results of testing was set and solved. The solution of the statistical games found: the 3 Rasch model optimal randomized decision rule (the best assessment of The modern method of assessing the test-takers the test-taker knowledge level), the probability of a knowledge level is based on the Item Response Theory correct decision on this rule, worst the a priori distribution (IRT) [Lin97]. Let's enumeration the main assumptions of the levels of knowledge tested. The advantage of this of this theory. approach is that we do not impose any restrictions on the β€’ Each test-takers has a certain knowledge level ΞΈ distribution of test-takers types and that the solution of from the set of possible (acceptable) levels Θ βŠ† ℝ. these statistical games is obtained by standard methods. In β€’ Each item of the test Ο„ is assigned a characteristic addition, the resulting solution is quite resistant to small function of the satisfiability of this item p‰ (ΞΈ). Its value changes in the problem conditions. is the probability of the item completed by the test-taker with the knowledge level ΞΈ. It is obvious that 0 ≀ Reference 𝑝‰ (ΞΈ) ≀ 1 when ΞΈ ∈ Θ. β€’ The assessment of the test-taker knowledge level [Lin97] van der Linden, Win. J., R.K. Hambleton, is based on the result of performing 𝑁 items Ο„% , Ο„' , . . , Ο„K , Handbook of Modern Item Response Theory. the characteristic functions 𝑝‰Š (ΞΈ), 𝑝‰‹ (ΞΈ), … , 𝑝‰Œ (ΞΈ). Edition. 1997, Springer – Verlag, New York, P.510. β€’ Difficulty of the item Ο„, and the knowledge level of the test ΞΈ can be measured in the same units, so the [Lut90] Lutsenko M.M. Game theoretic method for difference Ο„ βˆ’ ΞΈ shows the extent of exceeding the assessment the parameter of the binomial difficulty of the item over the test-taker knowledge level distribution, Probability theory and its [Lut15]. applications. 1990, β„–3. Pp. 471-481. In the Item Response Theory it is assumed that the probability of correctly take an item of difficulty Ο„ by a [Lut00] Lutsenko M.M., Ivanov M.A. Minimax test-taker with knowledge level ΞΈ is equal to confidence intervals for the parameter of a hypergeometric distribution, Automation and 𝑝‰ (ΞΈ) = 𝑝(ΞΈ βˆ’ Ο„) = (1 + 𝑒π‘₯𝑝 (βˆ’(ΞΈ βˆ’ Ο„)))O% (Rasch remote control. 2000, β„–7. Pp. 1125-1132. model). [Lut03] Lutsenko M.M., Maloshevskii S.G. Minimax We now turn to the general case of parameter assessment confidence intervals for the binomial parameter, in the rush model. Suppose that n test-takers take a test 𝑇 Journal of statistical planning and inference. containing 𝑁 items of difficulty: Ο„% < Ο„' <. . . < Ο„K . 2003, β„–1. Pp. 67-77. Then the probability that the i-th test-takers performed j- th item of the test is equal to [Lut11] Lutsenko M.M., Shadrinceva N.V. Educational Testing Accuracy, News of St. Petersburg State 61 Transport University. 2011, β„–4(29). Pp. 250- 258. [Lut14]. Lutsenko M.M., Seytmanbitov D.A. Test explicitly in Rasch model, Proceedings of the international banking institute. 2014. Pp. 114- 116. [Lut15] Lutsenko M., Seytmanbitov D., Game-theory Method for Knowledge Assessment, SING11- GTM2015 European Meeting on Game Theory. 2015. Pp. 125-126. [Ney00] Neyman Yu.M., Khlebnikov V.A.: Introduction to the theory of modeling and parameterization of pedagogical tests. 2000. 168 p. [Sha13] Shadrinceva N.V., Seytmanbitov D.A. About the reliability of testing in the rush model, Mathematical modeling in education, science, and manufacturing. 2013. Pp. 156-157. [Sha14] Shadrinceva N.V., Seytmanbitov D.A. Reliability of testing in the rush model, Institute of information technology and management SPBSTU. 2014. 62