<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Reliability of Decision in Testing Problems</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Mikhail M. Lutsenko</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dzhemil A. Seytmanbitov</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Anatoly M. Baranovskiy</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Emperor Alexander I St., Petersburg State Transport, University</institution>
          ,
          <addr-line>Saint Petersburg</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
      </contrib-group>
      <fpage>59</fpage>
      <lpage>62</lpage>
      <abstract>
        <p>In this paper, a statistical game was defined and solved. Its solution is: the optimal randomized decision rule, the probability of a correct decision on this rule, and the worst a priori distribution of the test subjects knowledge levels. We have developed a method for assessment the accuracy and reliability of decision making by on test results. The proposed program allows you to assessment the reliability of the solution for a test containing 10 items with different levels of difficulty, and 11 different levels of knowledge level.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>The main purpose of any testing is to assessment of
testtakers knowledge level and make a decision by the
result. Unfortunately, the test result (the number of
completed test items) depends not only on the test-takers
knowledge levels, but also on many other factors that are
hardly predicted. So, an adequate model of the decision
making problem must include probabilistic components.
We need a flexible model for building optimal
randomized solutions that takes into account various
types of solutions, a priori distributions at different
knowledge levels, and different item difficulties. This
model can be executed in the scope of statistical game
theory [Lin97].</p>
    </sec>
    <sec id="sec-2">
      <title>Methods and Algorithms</title>
      <p>Let’s Θ = {θ%, θ', … , θ) } be the set of possible
testtakers knowledge levels, (θ) – random variable (the
number of points tested with a knowledge level θ when
take test ). Let’s  = {%, ', … , 1 } – be the set of
possible decisions that the decision maker can make by
the test results.</p>
      <p>Examples of such solutions are the following sets of
solutions.</p>
      <p>Accurate assessment of the test-taker knowledge level:
% = θ%, ' = θ', …, 1 = θ1 ,
(The solution 2 is that the test-taker knowledge level is
θ2).</p>
      <p>Interval assessment of the test-taker knowledge level:
% = ∆%, ' = ∆', …, ) = ∆) ,
where ∆%, ∆', … , ∆) ⊆ Θ is a set of partially intersecting
intervals. Their elements can be interpreted as poorly
prepared test-takers, satisfactorily prepared, etc., or as
testtakers ready to execute task 1, task 2, etc. (Solution 2 is
that the interval ∆2 include the test-taker knowledge level
θ).</p>
      <p>Let’s ℎ(, θ) the benefit of the decision maker when it
made the decision , and the knowledge level was θ.
Using the benefit function, you can modeling the many
other sets of solutions.</p>
      <p>For example, the decision maker benefit function
ℎ(, θ) = 7
1, if θ ∈ Δ()
0, otherwise
is built on any set of confidence intervals ∆%, ∆', … , ∆) ⊆
Θ [Lut03].</p>
      <p>Let’s  = {1,2, … , } the set of possible values of a
random variable (θ), and, H() = ((θ) = ) the
probability of the corresponding event. These probabilities
can be calculated as probability for Bernoulli
trialsH() = KLL(1 − )KOL, if the difficulties of all
test items are the same. In the case when the difficulties of
the items are different, we will define a random variable
P(θ). Its value is set to one if the test-taker with the
knowledge level θ completed the j-th item of the test and
zero otherwise. Let’s p(, θ) = (P(θ) = 1) the
probability of the corresponding event. Then the random
variable (θ) is equal to the sum of the corresponding
random variables:
(θ) = %(θ) + '(θ) + ⋯ + K(θ)
In this case, the probability H() is calculated using the
known formulas [Ney00].</p>
      <p>Let’s δ is a function that gives each observed point  a
solution from the set , that is δ:  → .</p>
      <p>We denote the set of all solving functions by  = W.
Let’s</p>
      <p>K
(δ, θ) = Y H(P)ℎ(θ, δ(P)) (1)</p>
      <p>PZ%
is the expected value of the decision maker benefit if it
uses the solving function δ, and the test-takers knowledge
level is θ. If the function ℎ(, θ) is defined through a set
of confidence intervals, then the function (δ, θ) is equal
to the probability that the accurate value of the test-taker
knowledge level is in the required confidence interval.
This interval is built on the observed point  according to
the solving function δ.</p>
      <p>Note that the lowest probability that a set of
∆%, ∆', … , ∆) , generated by the solving function δ, will
include the unknown parameter θ, is called the
confidence probability for this set (for this solving
function), that is
γ = γ(δ) = min (θ ∈ ∆(δ(H))).</p>
      <p>H∈_
If the a priori distribution of knowledge levels ν is
known, then the best solving function δa can be builded
according to this distribution
(δa, ν) = max (δ, ν),
d
this function is called the Bayesian solving function
[Lut00].</p>
      <p>If the a priori distribution is unknown, then the best
solving function should be found from the solution of the
statistical game Γ = 〈, Θ, 〉. Where  – is the set of
solving functions (the set of decision maker strategies), Θ
– is the set of possible test-taker knowledge levels (the
set of condition of nature), and the decision maker benefit
function in a statistical game whose values are found by
the formula (1).</p>
      <p>To solve the matrix game, let's make a pair of mutually
dual problems. From the first problem, we find: the best
randomized solving function µ = (µ%, µ', … , µK), from
the second, the worst a priori distribution ν, and the total
value of these games is the value of the game Г.</p>
      <sec id="sec-2-1">
        <title>Direct problem:</title>
        <p>1
Y µP2 = 1;
2Z%</p>
      </sec>
      <sec id="sec-2-2">
        <title>Dual problem:</title>
        <p>K
PZ%</p>
        <p>ν → max,
Y ΛPµP ≥ ν1)
µP2 ≥ 0;
 = 1mm,mmm;
 = m1mm,mm.
ν = Y uP → min,</p>
        <p>PZ%
)
νqΛP ≤ uP1q1 ;  = 1mm,mmm; Y νs = 1.
sZ%
There are many ways to solve linear programming
problems. The most appropriate method here would be the
dynamic method [Lut90], specially developed by the
author for statistical games with threshold benefit
functions. However, in the simplest cases, the statistical
game can be solved using MS Excel. Although these
methods often do not provide an exact solution, they
always indicate valid solutions to problems and,
consequently, the upper and lower bounds of the matrix
game.
2</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Approbation</title>
      <p>Let's assume that the test includes 10 questions, and the
Statistician makes a decision by the results of this test.
The set of observations  includes 11 numbers: from zero
to 10. The probability of a correct answer to one test
question is equal to the test-taker knowledge level. The
possible values of test-taker knowledge level are set Θ =
{0,95; 0,85; 0,75; 0,65; 0,55; 0,45; 0,35; 0,25; 0,15;
0,05}.</p>
      <p>Then the probability of correctly answering x test items is
calculated as probability for Bernoulli trials:</p>
      <p>H() = %L{ ∙ θL ∙ (1 − θ)%{ OL,  = m0m;m1mm0m.</p>
      <p>Statistics assess the knowledge level in the subject. It puts
one of the following four grades: D = {A, B, C, D}. An
excellent grade is given to test-takers with 95% and 85%
knowledge, good from 75% to 55%, satisfactory from
45% to 35%, and unsatisfactory to the rest.</p>
      <p>Let's make a statistical game Г = 〈, Θ, 〉 and solve it in
mixed strategies. The benefit matrix in the game has a size
of 44 × 10. Unfortunately, MS Excel tools do not allow
you to accurately solve two mutually dual problems. But
we get upper and lower assessment of the game value, a
randomized decision function, and the worst a priori
distribution of the parameter θ [Lut11].</p>
      <p>As a result, we get the lower (0.519) and upper (0.562)
assessments of the game value.
which the Statistician indicates a particular solution
depending on the observation.</p>
      <p>So, the probability of correct decision of the statistics
about the test-taker knowledge level by the test results is
in the range from 0.52 to 0.56. Thus, in about 50% of
cases, the Statistician will make an incorrect decision
about the test-taker knowledge level [Sha13].
The resulting game values are a lower assessment and
can be improved with a known a priori distribution. In
addition, it seems unlikely that the a priori distribution of
knowledge levels coincides with the worst a priori
distribution [Sha14].</p>
      <p>Although the above statement does not take into account
all the features of the testing organization, it can be
clarified if necessary. However, the value of the game
will not improve much if you enter more items into the
test. Similar examples are considered in [Lut14].
3</p>
    </sec>
    <sec id="sec-4">
      <title>Rasch model</title>
      <p>The modern method of assessing the test-takers
knowledge level is based on the Item Response Theory
(IRT) [Lin97]. Let's enumeration the main assumptions
of this theory.
• Each test-takers has a certain knowledge level θ
from the set of possible (acceptable) levels Θ ⊆ ℝ.
• Each item of the test τ is assigned a characteristic
function of the satisfiability of this item p(θ). Its value
is the probability of the item completed by the test-taker
with the knowledge level θ. It is obvious that 0 ≤
(θ) ≤ 1 when θ ∈ Θ.
• The assessment of the test-taker knowledge level
is based on the result of performing  items τ%, τ', . . , τK,
the characteristic functions (θ), (θ), … , (θ).
• Difficulty of the item τ, and the knowledge level
of the test θ can be measured in the same units, so the
difference τ − θ shows the extent of exceeding the
difficulty of the item over the test-taker knowledge level
[Lut15].</p>
      <p>In the Item Response Theory it is assumed that the
probability of correctly take an item of difficulty τ by a
test-taker with knowledge level θ is equal to
(θ) = (θ − τ) = (1 +  (−(θ − τ)))O% (Rasch
model).</p>
      <p>We now turn to the general case of parameter assessment
in the rush model. Suppose that n test-takers take a test 
containing  items of difficulty: τ% &lt; τ' &lt;. . . &lt; τK.
Then the probability that the i-th test-takers performed
jth item of the test is equal to
s,2 = (1 +  (θs − τ2))O%, τ2 ∈ ℝ,
Let’s 2 – is the number of participants who correctly
performed the item with the number  (the number of
initial points j-th item); s – is the number of correctly
completed items participant number . (As a rule, these
are all integers from 0 to N inclusive). Assessment
θ{ , θ%, . . . , θK; τ{ , τ%, . . . , τK of the corresponding
parameters can be obtained by the method of moments or
by the method of greatest likelihood. To do this, need to
solve a system of equations.
⎪⎧Y s,2 = s,  = m1mm,mm;</p>
      <p>2Z%
⎨ 1
⎪Y s,2 = 2,  = 1mm,mmm.
⎩sZ%
(2)
The possible values of the right parts of this system
(numbers s) are integers from 0 to N. So system (2)
consists of 2N+1 equations and contains 2N+1 unknowns.</p>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>In this paper, the problem of calculating the reliability of
decisions made based on the results of testing was set and
solved. The solution of the statistical games found: the
optimal randomized decision rule (the best assessment of
the test-taker knowledge level), the probability of a
correct decision on this rule, worst the a priori distribution
of the levels of knowledge tested. The advantage of this
approach is that we do not impose any restrictions on the
distribution of test-takers types and that the solution of
these statistical games is obtained by standard methods. In
addition, the resulting solution is quite resistant to small
changes in the problem conditions.</p>
      <p>Shadrinceva N.V., Seytmanbitov D.A.</p>
      <p>Reliability of testing in the rush model,
Institute of information technology and
management SPBSTU. 2014.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [Lin97]
          <string-name>
            <surname>van der Linden</surname>
            , Win. J.,
            <given-names>R.K.</given-names>
          </string-name>
          <string-name>
            <surname>Hambleton</surname>
          </string-name>
          ,
          <source>Handbook of Modern Item Response Theory. Edition</source>
          .
          <year>1997</year>
          , Springer - Verlag, New York, P.
          <volume>510</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>[Lut90] Lutsenko M.M.</surname>
          </string-name>
          <article-title>Game theoretic method for assessment the parameter of the binomial distribution</article-title>
          ,
          <source>Probability theory and its applications</source>
          .
          <year>1990</year>
          , №3. Pp.
          <volume>471</volume>
          -
          <fpage>481</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>[Lut00] Lutsenko</surname>
            <given-names>M.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ivanov</surname>
            <given-names>M.A.</given-names>
          </string-name>
          <article-title>Minimax confidence intervals for the parameter of a hypergeometric distribution, Automation and remote control</article-title>
          .
          <year>2000</year>
          , №7. Pp.
          <volume>1125</volume>
          -
          <fpage>1132</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>[Lut03] Lutsenko</surname>
            <given-names>M.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Maloshevskii</surname>
            <given-names>S.G.</given-names>
          </string-name>
          <article-title>Minimax confidence intervals for the binomial parameter</article-title>
          ,
          <source>Journal of statistical planning and inference.</source>
          <year>2003</year>
          , №1. Pp.
          <volume>67</volume>
          -
          <fpage>77</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>[Lut11] Lutsenko</surname>
            <given-names>M.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shadrinceva</surname>
            <given-names>N.V.</given-names>
          </string-name>
          <string-name>
            <surname>Educational Testing</surname>
            <given-names>Accuracy</given-names>
          </string-name>
          ,
          <source>News of St. Petersburg State 61</source>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <given-names>Transport</given-names>
            <surname>University</surname>
          </string-name>
          .
          <year>2011</year>
          , №
          <volume>4</volume>
          (
          <issue>29</issue>
          ). Pp.
          <volume>250</volume>
          -
          <fpage>258</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <surname>[Lut14]. Lutsenko M.M.</surname>
          </string-name>
          ,
          <string-name>
            <surname>Seytmanbitov</surname>
            <given-names>D.A.</given-names>
          </string-name>
          <article-title>Test explicitly in Rasch model</article-title>
          ,
          <source>Proceedings of the international banking institute</source>
          .
          <year>2014</year>
          . Pp.
          <volume>114</volume>
          -
          <fpage>116</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <surname>[Lut15] Lutsenko</surname>
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Seytmanbitov</surname>
            <given-names>D.</given-names>
          </string-name>
          ,
          <article-title>Game-theory Method for Knowledge Assessment</article-title>
          ,
          <fpage>SING11</fpage>
          - GTM2015
          <source>European Meeting on Game Theory</source>
          .
          <year>2015</year>
          . Pp.
          <volume>125</volume>
          -
          <fpage>126</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [Ney00]
          <string-name>
            <given-names>Neyman</given-names>
            <surname>Yu</surname>
          </string-name>
          .M.,
          <string-name>
            <surname>Khlebnikov</surname>
            <given-names>V.A.</given-names>
          </string-name>
          :
          <article-title>Introduction to the theory of modeling and parameterization of pedagogical tests</article-title>
          .
          <year>2000</year>
          . 168 p.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [Sha13]
          <string-name>
            <surname>Shadrinceva</surname>
            <given-names>N.V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Seytmanbitov</surname>
            <given-names>D.A. About</given-names>
          </string-name>
          <article-title>the reliability of testing in the rush model, Mathematical modeling in education, science</article-title>
          , and manufacturing.
          <source>2013</source>
          . Pp.
          <volume>156</volume>
          -
          <fpage>157</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>