<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>An enrolment admission strategy based on data analytics</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Polytechnique Montreal</string-name>
        </contrib>
      </contrib-group>
      <fpage>73</fpage>
      <lpage>79</lpage>
      <abstract>
        <p>Every university program that has a limited capacity of enrolment faces the task of selecting the candidates that have the best chance of success. We introduce a selection strategy based on data analytics that only requires a ranking of candidates from di↵ erent sources to determine a number of candidates to select from each source. The strategy relies on the distribution of student marks and on historical data of each source. It consists in determining a minimal threshold mark which, in turn, is used to determine proportions of students to admit from each source. The strategy ensures a maximum success rate under certain assumptions.</p>
      </abstract>
      <kwd-group>
        <kwd>Student Enrolment</kwd>
        <kwd>Learning Analytics</kwd>
        <kwd>Candidate Selection</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        A case for the use of Learning Analytics in educational institutions can be made
for the objective of selecting the candidates that have the best chance of success
at a given university program. In the words of [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], we can consider the selection
process as a standard machine learning prediction task:
      </p>
      <p>“Admission is to a great extent a prediction task, where admissions
committees aim at estimating a candidate’s chance of future study
success. For these kinds of tasks, Meehl (1954) provided strong evidence
for the superiority of the statistical approach over the clinical one. Since
then, a plethora of studies has challenged this result but none
contradicted Meehl’s conclusion (Kahneman, 2011).”</p>
      <p>
        While the candidate selection problem is trivial if the decision is based on a
single criterion, such as the result of an admission test score (GPA, for eg., [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ];
or GRE), or on any single score by which a candidate can be ranked, such score
is not always available. Often, the decision must rely on a set of scores that are
not comparable.
      </p>
      <p>
        The typical situation is that an admission decision is based on the ranking of
students within a given cohort and for a given institution. The choice is simple for
the students from the same institution, but not for the students from di↵ erent
institutions. One solution is to ask candidate students to take an admission
exam, but this is unpractical for students that apply from abroad or from distant
locations. Moreover, the admission test may not be highly reliable [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>
        Other solutions, often considered more reliable, are to to revert to interviews
and personal statements [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. But not only are their reliability questioned [
        <xref ref-type="bibr" rid="ref4 ref5 ref7">4, 5,
7</xref>
        ], these approaches also incur issues of time and e↵ orts, which can be critical
for large cohorts.
      </p>
      <p>We introduce a means to decide on student admission based on historical data
of the host institution itself. Given the information on student marks and their
origin, one approach consists in determining the proportion of students from a
given origin that are above a given score. The approach relies on computing the
expected mean score of a proportion of students above a given score for a given
origin. And the key to the approach is that the scores of all students are on the
same scale, namely the institution’s own grades.</p>
      <p>The strategy is first described below, followed by a short demonstration of
the impact it has compared to a simpler solution.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Historical data cuto↵ admission treshold (HDCT)</title>
      <p>We will refer to the proposed approach as the Historical data cuto↵ admission
treshold (HDCT). To illustrate its basic principle, consider Figure 1. It shows
a distribution of student scores on a Z-scale that follows a Normal
distribution (N (0, 1)) along with the proportion of students above the score, which
corresponds to one minus the cumulative distribution function (labeled
“cummul. admiss.”). The dotted line indicates that the score of 0 corresponds to a
proportion of 50% of students are above that score. We can also see from the
“cummul. admiss.” curve that at score Z = 0.5 we have about 80% of students
above that score.</p>
      <p>This graph is the basis of the HDCT admission process. The general
principle is to determine the proportion of students to retain based on a common
minimal score, obtained from the institution’s historical data. Given that it is
reasonable to assume that all student scores are on the same scale, namely the
institution’s historical scores, they are comparable even though the students may
have di↵ erent origins. And the key is not to rest the decision on a score obtained
from the origin institution, but on historical data from the host institution. This
approach incurs that the institution keeps track of which origin institution the
student comes from and, as we discuss later, of the ratio of admitted students
over the number of candidates.</p>
      <p>To illustrate the general approach based on the above introduction, figure
2 shows the case where we have students from three di↵ erent origins, source a,
b, and c. The mean, standard deviation (s.d.), and the relative proportion of</p>
      <p>Score distribution
and cummulative admission
50% cut off tresh.
score distribution
cummul. admiss.
candidates from each source is shown below, along with the proportion admitted
at the 50% cut o↵ threshold.
We can see from the cumulative distribution curves that source a (mean= 0.5),
source-b (mean=0), and source c (mean=0.5) respectively represent 50%, 27%,
23% of all students applicants. Because the variance of the distributions is not
equal (0.5, 0.2, 0.2) and they also have uneven proportions, the cut o↵ threshold
to admit 50% of students is not at Z = 0, but instead around Z = 0.07. This
threshold is shown as the dotted line in Figure 2: the score where the global
cumulative distribution curve reaches 50% of all students, which in turns
corresponds to 20% of source a, 64% of source b, and almost all of source c.</p>
      <p>The implication of this graph is that if we had, for example, 1000 candidates
and we wanted to admit only 500 of them, then only about 200 source a would be
admitted, because it has had on average 0.5 standard deviation below the mean
in the historical data. Whereas based on a policy of admitting the same ratio for
all sources, we would then admit 250 of them for source a. Divergence from a
uniform admittance ratio is even more stringent for the other two sources: almost
all students from source c would be admitted because they historically scored
0.5 standard deviation above average and have a lower standard deviation, and
most of source b would also be accepted.</p>
      <p>The Z-score corresponding to the proportion of students we wish to admit
from the total applicants is calculated based on an optimization function that
can be defined as:
arg min =</p>
      <p>Z</p>
      <p>X ((props2 source · pnorm(Z, scos, sds))
s
prop.admitted)2
where:
– props2 source is the proportion of applicants from a given source, s,
– pnorm is the cumulative distribution function (for the Normal distribution)
that takes as arguments:
• Z: the Z-score to optimize (threshold),
• scos: the mean historical score of the given source, and
• sds: its standard deviation;
– prop.admitted: the proportion of students we wish to admit to meet the
limited admission capacity.
2.1</p>
      <p>Smoothing factor
In some cases, the number of students from a give source may be small, or even
nonexistent if it represents a new source. To avoid extreme values of mean and
standard deviations that result from small samples, a smoothing factor should
be used. Assuming we have Ns students from source s, a smoothing factor ↵ can
be used to bring the mean of the score with the following smoothing formula:
xˆis =</p>
      <p>PNs xi + ↵ x
i</p>
      <p>Ns + ↵
where xˆis is the smoothed value that should replace the value of the mean and
x is the general mean of all students. A reasonable value is to have ↵ = 5,
although the choice is rather arbitrary. A similar smoothing should be applied
to the standard deviation based on historical data.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Impact example</title>
      <p>To assess the impact of the admission strategy over a simpler one, we run a
simulation and compare the di↵ erence in the expected scores of students admitted
with each strategy.</p>
      <p>The simpler strategy is to accept an equal proportion of students from each
source.</p>
      <p>Let us take the numbers from Figure 2 to run a simulation and assume we
admit 1000 students. The expected average score from a given source
corresponds to the number of students at a given score (f req(sco)), proportionally
represented by the source’s density of the distribution, times the score. This is
repeated for each source and divided by the number of students (N ) :
E(sco) =</p>
      <p>P
s2 source</p>
      <p>Psco2 s f req(sco) ⇥ sco</p>
      <p>N</p>
      <p>The numbers that correspond to each strategy for each source are reported
in the following table:</p>
      <sec id="sec-3-1">
        <title>Equal</title>
        <p>proportion</p>
      </sec>
      <sec id="sec-3-2">
        <title>HDCT</title>
        <p>The major di↵ erence between the equal proportion and the proposed HDCT
approach is that much fewer candidates are accepted from source a for the benefit
of greater numbers from sources b and c. The e↵ ect is that the expected scores
from source a increases while it decreases for sources b and c, but the overall
e↵ ect is an increase in the expected score of 0.17 (0.31 0.14).</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Conclusion</title>
      <p>This paper describes a strategy to select the proportion of candidates to admit
in order to maximize the expected success rate of the students to a given
program. The strategy is based on historical data from the host institution. The
advantage of the approach is that it does not require a standardized score across
students from di↵ erent institutions, which is most of the time unavailable unless
the candidates are subject to an admission test. Considering that candidates
can come from remote location and that running an admission test can involve
considerable time and e↵ ort, this is a major advantage.</p>
      <p>However, the approach has its limitation, the first of which is to have
historical data from the di↵ erent institutions the candidates come from. Often,
the sample can be small and a correction in the form of a smoothing factor is
proposed to alleviate this issue.</p>
      <p>Another limitation is that, as described in this paper, it assumes the
distribution of scores is Gaussian. Now, this limitation is not inherent to the general
approach. Non Gaussian, or even arbitrary distributions could be handled, but
the computations would need to be adapted to the actual distribution.</p>
      <p>Finally, another issue is that the distributions have to reflect the scores of the
origin institution, which must be derived from the historical data of the accepted
candidates in the host institution. As presented in this paper, we assume the
historical data is a faithful representation of that distribution, but if the selection
is based on a small proportion of applicants, this assumption would be false.
Here again, this is not a limitation of the approach itself, and computational
adjustments would have to take this factor into account. The adjustment will
rely on information about the ratio of admitted students per institution.</p>
      <p>To close the loop on the question of how Learning Analytics can bring value
to education, we use the admission problem that every institution faced with the
need to select candidates from disparate source is confronted with. The
candidate selection approach uses a strategy that relies on statistics and optimization
techniques. It is an objective, e↵ ective, and e cient means to achieve the goal
of selection the candidates that have the best chances of success.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Didier</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kreiter</surname>
            ,
            <given-names>C.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Buri</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Solow</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Investigating the utility of a GPA institutional adjustment index</article-title>
          .
          <source>Advances in health sciences education</source>
          <volume>11</volume>
          (
          <issue>2</issue>
          ),
          <fpage>145</fpage>
          -
          <lpage>153</lpage>
          (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Eva</surname>
            ,
            <given-names>K.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Reiter</surname>
            ,
            <given-names>H.I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosenfeld</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Trinh</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wood</surname>
            ,
            <given-names>T.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Norman</surname>
            ,
            <given-names>G.R.</given-names>
          </string-name>
          :
          <article-title>Association between a medical school admission process using the multiple mini-interview and national licensing examination scores</article-title>
          .
          <source>Jama</source>
          <volume>308</volume>
          (
          <issue>21</issue>
          ),
          <fpage>2233</fpage>
          -
          <lpage>2240</lpage>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Kahneman</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Thinking, fast and slow</article-title>
          .
          <source>Macmillan</source>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Meehl</surname>
            ,
            <given-names>P.E.:</given-names>
          </string-name>
          <article-title>Clinical versus statistical prediction: A theoretical analysis and a review of the evidence</article-title>
          . (
          <year>1954</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Murphy</surname>
            ,
            <given-names>S.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Klieger</surname>
            ,
            <given-names>D.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Borneman</surname>
            ,
            <given-names>M.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kuncel</surname>
            ,
            <given-names>N.R.:</given-names>
          </string-name>
          <article-title>The predictive power of personal statements in admissions: A meta-analysis and cautionary tale</article-title>
          .
          <source>College and University</source>
          <volume>84</volume>
          (
          <issue>4</issue>
          ),
          <volume>83</volume>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Salvatori</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Reliability and validity of admissions tools used to select students for the health professions</article-title>
          .
          <source>Advances in Health Sciences Education</source>
          <volume>6</volume>
          (
          <issue>2</issue>
          ),
          <fpage>159</fpage>
          -
          <lpage>175</lpage>
          (
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Siu</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Reiter</surname>
            ,
            <given-names>H.I.</given-names>
          </string-name>
          :
          <article-title>Overview: what's worked and what hasn't as a guide towards predictive admissions tool development</article-title>
          .
          <source>Advances in Health Sciences Education</source>
          <volume>14</volume>
          (
          <issue>5</issue>
          ),
          <volume>759</volume>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Zimmermann</surname>
          </string-name>
          , J.,
          <string-name>
            <surname>von Davier</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Heinimann</surname>
            ,
            <given-names>H.R.</given-names>
          </string-name>
          :
          <article-title>Adaptive admissions process for e↵ ective and fair graduate admission</article-title>
          .
          <source>International Journal of Educational Management</source>
          <volume>31</volume>
          (
          <issue>4</issue>
          ),
          <fpage>540</fpage>
          -
          <lpage>558</lpage>
          (
          <year>2017</year>
          ). https://doi.org/10.1108/IJEM-06-2015-0080, https://doi.org/10.1108/IJEM-06-2015-0080
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>