Reliability of Decision in Testing Problems

      Mikhail M. Lutsenko                           Dzhemil A. Seytmanbitov                    Anatoly M. Baranovskiy
     Emperor Alexander I St.                         Emperor Alexander I St.                    Emperor Alexander I St.
    Petersburg State Transport                      Petersburg State Transport                 Petersburg State Transport
            University                                      University                                 University
     Saint Petersburg, Russia                        Saint Petersburg, Russia                   Saint Petersburg, Russia
         ml4116@mail.ru                              dzhem93@gmail.com                             bamvka@mail.ru


                                                                     the test results.
                                                                     Examples of such solutions are the following sets of
                                                                     solutions.
                                                                     Accurate assessment of the test-taker knowledge level:
                          Abstract
     In this paper, a statistical game was defined
     and solved. Its solution is: the optimal                                      𝑑% = θ% , 𝑑' = θ' , …, 𝑑1 = θ1 ,
     randomized decision rule, the probability of a
     correct decision on this rule, and the worst a                  (The solution 𝑑2 is that the test-taker knowledge level is
     priori distribution of the test subjects                        θ2 ).
     knowledge levels. We have developed a                           Interval assessment of the test-taker knowledge level:
     method for assessment the accuracy and
     reliability of decision making by on test results.                           𝑑% = ∆% , 𝑑' = ∆' , …, 𝑑) = ∆) ,
     The proposed program allows you to
     assessment the reliability of the solution for a                where ∆% , ∆' , … , ∆) ⊆ Θ is a set of partially intersecting
     test containing 10 items with different levels of               intervals. Their elements can be interpreted as poorly
     difficulty, and 11 different levels of knowledge                prepared test-takers, satisfactorily prepared, etc., or as test-
     level.                                                          takers ready to execute task 1, task 2, etc. (Solution 𝑑2 is
                                                                     that the interval ∆2 include the test-taker knowledge level
                                                                     θ).
Introduction                                                         Let’s ℎ(𝑑, θ) the benefit of the decision maker when it
 The main purpose of any testing is to assessment of test-           made the decision 𝑑, and the knowledge level was θ.
 takers knowledge level and make a decision by the                   Using the benefit function, you can modeling the many
 result. Unfortunately, the test result (the number of               other sets of solutions.
 completed test items) depends not only on the test-takers           For example, the decision maker benefit function
 knowledge levels, but also on many other factors that are
 hardly predicted. So, an adequate model of the decision                                          1, if θ ∈ Δ(𝑑)
                                                                                       ℎ(𝑑, θ) = 7
 making problem must include probabilistic components.                                             0, otherwise
 We need a flexible model for building optimal
                                                                     is built on any set of confidence intervals ∆% , ∆' , … , ∆) ⊆
 randomized solutions that takes into account various
                                                                     Θ [Lut03].
 types of solutions, a priori distributions at different
                                                                     Let’s 𝑋 = {1,2, … , 𝑁} the set of possible values of a
 knowledge levels, and different item difficulties. This
                                                                     random variable 𝑋(θ), and, 𝑃H (𝑥) = 𝑃(𝑋(θ) = 𝑥) the
 model can be executed in the scope of statistical game
                                                                     probability of the corresponding event. These probabilities
 theory [Lin97].
                                                                     can be calculated as probability for Bernoulli
                                                                     trials𝑃H (𝑥) = 𝐶KL 𝑝 L (1 − 𝑝)KOL , if the difficulties of all
1    Methods and Algorithms                                          test items are the same. In the case when the difficulties of
 Let’s Θ = {θ% , θ' , … , θ) } be the set of possible test-          the items are different, we will define a random variable
 takers knowledge levels, 𝑋(θ) – random variable (the                𝑋P (θ). Its value is set to one if the test-taker with the
 number of points tested with a knowledge level θ when               knowledge level θ completed the j-th item of the test and
 take test 𝑇). Let’s 𝐷 = {𝑑% , 𝑑' , … , 𝑑1 } – be the set of         zero otherwise. Let’s p(𝑘, θ) = 𝑃(𝑋P (θ) = 1) the
 possible decisions that the decision maker can make by              probability of the corresponding event. Then the random
                                                                     variable 𝑋(θ) is equal to the sum of the corresponding
Copyright c by the paper's authors. Use permitted under Creative     random variables:
Commons License Attribution 4.0 International (CC BY 4.0). In: A.
Khomonenko, B. Sokolov, K. Ivanova (eds.): Selected Papers of the
Models and Methods of Information Systems Research Workshop, St.                𝑋(θ) = 𝑋% (θ) + 𝑋' (θ) + ⋯ + 𝑋K (θ)
Petersburg, Russia, 4-5 Dec. 2019, published at http://ceur-ws.org
                                                                                                                                  59
                                                                                                  K
In this case, the probability 𝑃H (𝑥) is calculated using the
known formulas [Ney00].                                                                   ν = Y uP → min,
Let’s δ is a function that gives each observed point 𝑥 a                                        PZ%
                                                                                                                               )
solution from the set 𝐷, that is δ: 𝑋 → 𝐷.
We denote the set of all solving functions by 𝐷 = 𝐷 W .             ν Λ 𝐵 ≤ uP 1q1 ;
                                                                     q P                                   mmmmm
                                                                                                       𝑘 = 1, 𝑁;              Y νs = 1.
                                                                                                                              sZ%
                                                               There are many ways to solve linear programming
                                                               problems. The most appropriate method here would be the
Let’s                                                          dynamic method [Lut90], specially developed by the
                    K                                          author for statistical games with threshold benefit
        𝐻(δ, θ) = Y 𝑃H (𝑥P )ℎ(θ, δ(𝑥P ))                       functions. However, in the simplest cases, the statistical
                                                      (1)
                                                               game can be solved using MS Excel. Although these
                   PZ%
                                                               methods often do not provide an exact solution, they
is the expected value of the decision maker benefit if it      always indicate valid solutions to problems and,
uses the solving function δ, and the test-takers knowledge     consequently, the upper and lower bounds of the matrix
level is θ. If the function ℎ(𝑑, θ) is defined through a set
                                                               game.
of confidence intervals, then the function 𝐻(δ, θ) is equal
to the probability that the accurate value of the test-taker
knowledge level is in the required confidence interval.        2   Approbation
This interval is built on the observed point 𝑥 according to    Let's assume that the test includes 10 questions, and the
the solving function δ.                                        Statistician makes a decision by the results of this test.
Note that the lowest probability that a set of                 The set of observations 𝑋 includes 11 numbers: from zero
∆% , ∆' , … , ∆) , generated by the solving function δ, will   to 10. The probability of a correct answer to one test
include the unknown parameter θ, is called the                 question is equal to the test-taker knowledge level. The
confidence probability for this set (for this solving          possible values of test-taker knowledge level are set Θ =
function), that is                                             {0,95; 0,85; 0,75; 0,65; 0,55; 0,45; 0,35; 0,25; 0,15;
                                                               0,05}.
            γ = γ(δ) = min 𝑃(θ ∈ ∆(δ(𝑋H ))).                   Then the probability of correctly answering x test items is
                          H∈_
                                                               calculated as probability for Bernoulli trials:
If the a priori distribution of knowledge levels ν is
known, then the best solving function δa can be builded                             L
                                                                          𝑃H (𝑥) = 𝐶%{ ∙ θL ∙ (1 − θ)%{OL , 𝑥 = mmmmmm
                                                                                                                0; 10.
according to this distribution
                                                               Statistics assess the knowledge level in the subject. It puts
                  𝐻(δa , ν) = max 𝐻(δ, ν),                     one of the following four grades: D = {A, B, C, D}. An
                                d
                                                               excellent grade is given to test-takers with 95% and 85%
this function is called the Bayesian solving function          knowledge, good from 75% to 55%, satisfactory from
[Lut00].                                                       45% to 35%, and unsatisfactory to the rest.
If the a priori distribution is unknown, then the best         Let's make a statistical game Г = 〈𝐷, Θ, 𝐻〉 and solve it in
solving function should be found from the solution of the      mixed strategies. The benefit matrix in the game has a size
statistical game Γ = 〈𝐷, Θ, 𝐻〉. Where 𝐷 – is the set of        of 44 × 10. Unfortunately, MS Excel tools do not allow
solving functions (the set of decision maker strategies), Θ    you to accurately solve two mutually dual problems. But
– is the set of possible test-taker knowledge levels (the      we get upper and lower assessment of the game value, a
set of condition of nature), and the decision maker benefit    randomized decision function, and the worst a priori
function in a statistical game whose values are found by       distribution of the parameter θ [Lut11].
the formula (1).                                               As a result, we get the lower (0.519) and upper (0.562)
To solve the matrix game, let's make a pair of mutually        assessments of the game value.
dual problems. From the first problem, we find: the best
randomized solving function µ = (µ% , µ' , … , µK ), from                        Table 1: Randomized decision function µ.
the second, the worst a priori distribution ν, and the total               µ%{ µ• µ€ µ• µ‚ µƒ µ„ µ… µ' µ% µ{
value of these games is the value of the game Г.
                                                                           1,00
                                                                                  0,49
                                                                                         0,75
                                                                                                0
                                                                                                        0
                                                                                                               0
                                                                                                                      0
                                                                                                                             0
                                                                                                                                    0
                                                                                                                                           0
                                                                                                                                                  0
                                                                   A


Direct problem:
                          ν → max,
                                                                                  0,51
                                                                                         0,24
                                                                                                0,95
                                                                                                        0,75
                                                                                                               0,70


                     K
                                                                           0


                                                                                                                      0
                                                                                                                             0
                                                                                                                                    0
                                                                                                                                           0
                                                                                                                                                  0
                                                                      B
                                                               decision


                    Y ΛP 𝐵µP ≥ ν1)
                    PZ%
                                                                                                0,05
                                                                                                        0,25
                                                                                                               0,30
                                                                                                                      1,00
                                                                                                                             1,00


   1
                                                               C

                                                                           0
                                                                                  0
                                                                                         0


                                                                                                                                    0
                                                                                                                                           0
                                                                                                                                                  0


        2
  Y µP = 1;          2
                   µP ≥ 0;          mmmmm
                                𝑘 = 1, 𝑁;      𝑗 = mmmmm
                                                   1, 𝑛.
                                                                                                                                    1,00
                                                                                                                                           1,00
                                                                                                                                                  1,00


  2Z%
                                                                   D

                                                                           0
                                                                                  0
                                                                                         0
                                                                                                0
                                                                                                        0
                                                                                                               0
                                                                                                                      0
                                                                                                                             0


Dual problem:
                                                               The columns in this table indicate the probabilities with
                                                                                                                                                  60
 which the Statistician indicates a particular solution                                    𝑝s,2 = (1 + 𝑒𝑥𝑝 (θs − τ2 ))O% , τ2 ∈ ℝ,
 depending on the observation.
 So, the probability of correct decision of the statistics                     Let’s 𝑐2 – is the number of participants who correctly
 about the test-taker knowledge level by the test results is                   performed the item with the number 𝑗 (the number of
 in the range from 0.52 to 0.56. Thus, in about 50% of                         initial points j-th item); 𝑏s – is the number of correctly
 cases, the Statistician will make an incorrect decision                       completed items participant number 𝑖. (As a rule, these
 about the test-taker knowledge level [Sha13].                                 are all integers from 0 to N inclusive). Assessment
                                                                               θ“{ , θ“% , . . . , θ“K ; τ”{ , τ”% , . . . , τ”K of the corresponding
Table 2: Worst a priori distribution of the parameter θ.                       parameters can be obtained by the method of moments or
                                                                               by the method of greatest likelihood. To do this, need to
                                                                               solve a system of equations.
         0,95

                0,85

                       0,75

                              0,65

                                     0,55

                                            0,45

                                                   0,35

                                                          0,25

                                                                 0,15

                                                                        0,05
   θs
                                                                                                K
                                                                                             ⎧Y 𝑝 = 𝑏 , 𝑖 = mmmmm
                                                                                                            1, 𝑛;
                0,11

                       0,01

                              0,04

                                     0,25

                                            0,19

                                                   0,15

                                                          0,26
   νs                                                                                        ⎪   s,2 s
         0


                                                                 0

                                                                        0
                                                                                               2Z%
                                                                                                1                                             (2)
                                                                                             ⎨
 The resulting game values are a lower assessment and                                                           mmmmm
                                                                                             ⎪Y 𝑝s,2 = 𝑐2 , 𝑗 = 1, 𝑁.
 can be improved with a known a priori distribution. In                                      ⎩ sZ%
 addition, it seems unlikely that the a priori distribution of
 knowledge levels coincides with the worst a priori                            The possible values of the right parts of this system
 distribution [Sha14].                                                         (numbers 𝑏s ) are integers from 0 to N. So system (2)
 Although the above statement does not take into account                       consists of 2N+1 equations and contains 2N+1 unknowns.
 all the features of the testing organization, it can be
 clarified if necessary. However, the value of the game                        Conclusion
 will not improve much if you enter more items into the
                                                                               In this paper, the problem of calculating the reliability of
 test. Similar examples are considered in [Lut14].
                                                                               decisions made based on the results of testing was set and
                                                                               solved. The solution of the statistical games found: the
 3 Rasch model                                                                 optimal randomized decision rule (the best assessment of
 The modern method of assessing the test-takers                                the test-taker knowledge level), the probability of a
 knowledge level is based on the Item Response Theory                          correct decision on this rule, worst the a priori distribution
 (IRT) [Lin97]. Let's enumeration the main assumptions                         of the levels of knowledge tested. The advantage of this
 of this theory.                                                               approach is that we do not impose any restrictions on the
 •       Each test-takers has a certain knowledge level θ                      distribution of test-takers types and that the solution of
 from the set of possible (acceptable) levels Θ ⊆ ℝ.                           these statistical games is obtained by standard methods. In
 •       Each item of the test τ is assigned a characteristic                  addition, the resulting solution is quite resistant to small
 function of the satisfiability of this item p‰ (θ). Its value                 changes in the problem conditions.
 is the probability of the item completed by the test-taker
 with the knowledge level θ. It is obvious that 0 ≤                            Reference
 𝑝‰ (θ) ≤ 1 when θ ∈ Θ.
 •       The assessment of the test-taker knowledge level                      [Lin97] van der Linden, Win. J., R.K. Hambleton,
 is based on the result of performing 𝑁 items τ% , τ' , . . , τK ,                     Handbook of Modern Item Response Theory.
 the characteristic functions 𝑝‰Š (θ), 𝑝‰‹ (θ), … , 𝑝‰Œ (θ).                           Edition. 1997, Springer – Verlag, New York,
                                                                                       P.510.
 •       Difficulty of the item τ, and the knowledge level
 of the test θ can be measured in the same units, so the
                                                                               [Lut90] Lutsenko M.M. Game theoretic method for
 difference τ − θ shows the extent of exceeding the
                                                                                       assessment the parameter of the binomial
 difficulty of the item over the test-taker knowledge level
                                                                                       distribution, Probability theory and its
 [Lut15].                                                                              applications. 1990, №3. Pp. 471-481.
 In the Item Response Theory it is assumed that the
 probability of correctly take an item of difficulty τ by a                    [Lut00]    Lutsenko M.M., Ivanov M.A. Minimax
 test-taker with knowledge level θ is equal to                                           confidence intervals for the parameter of a
                                                                                         hypergeometric distribution, Automation and
   𝑝‰ (θ) = 𝑝(θ − τ) = (1 + 𝑒𝑥𝑝 (−(θ − τ)))O% (Rasch                                     remote control. 2000, №7. Pp. 1125-1132.
                         model).
                                                                               [Lut03] Lutsenko M.M., Maloshevskii S.G. Minimax
 We now turn to the general case of parameter assessment                               confidence intervals for the binomial parameter,
 in the rush model. Suppose that n test-takers take a test 𝑇                           Journal of statistical planning and inference.
 containing 𝑁 items of difficulty: τ% < τ' <. . . < τK .                               2003, №1. Pp. 67-77.
 Then the probability that the i-th test-takers performed j-
 th item of the test is equal to                                               [Lut11] Lutsenko M.M., Shadrinceva N.V. Educational
                                                                                       Testing Accuracy, News of St. Petersburg State
                                                                                                                                                    61
          Transport University. 2011, №4(29). Pp. 250-
          258.

[Lut14]. Lutsenko M.M., Seytmanbitov D.A. Test
        explicitly in Rasch model, Proceedings of the
        international banking institute. 2014. Pp. 114-
        116.

[Lut15] Lutsenko M., Seytmanbitov D., Game-theory
        Method for Knowledge Assessment, SING11-
        GTM2015 European Meeting on Game
        Theory. 2015. Pp. 125-126.

[Ney00] Neyman Yu.M., Khlebnikov V.A.: Introduction
        to the theory of modeling and parameterization
        of pedagogical tests. 2000. 168 p.

[Sha13] Shadrinceva N.V., Seytmanbitov D.A. About
        the reliability of testing in the rush model,
        Mathematical modeling in education, science,
        and manufacturing. 2013. Pp. 156-157.

[Sha14]     Shadrinceva N.V., Seytmanbitov D.A.
          Reliability of testing in the rush model,
          Institute of information technology and
          management SPBSTU. 2014.


                                                          62