<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Assessment analytics for peer-assessment: a model and implementation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Blaženka Divjak</string-name>
          <email>bdivjak@foi.hr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Darko Grabar</string-name>
          <email>darko.grabar@foi.hr</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marcel Maretic´</string-name>
          <email>mmaretic@foi.hr</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Faculty of Organization and</institution>
          ,
          <addr-line>Informatics</addr-line>
          ,
          <institution>University of Zagreb</institution>
          ,
          <addr-line>Pavlinska 2, 42000 Varaždin</addr-line>
          ,
          <country country="HR">Croatia</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Faculty of Organization and</institution>
          ,
          <addr-line>Informatics</addr-line>
          ,
          <institution>University of Zagreb</institution>
          ,
          <addr-line>Pavlinska 2, 42000 Varaždin</addr-line>
          ,
          <country country="HR">Croatia</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Faculty of Organization and</institution>
          ,
          <addr-line>Informatics</addr-line>
          ,
          <institution>University of Zagreb</institution>
          ,
          <addr-line>Pavlinska 2, 42000 Varaždin</addr-line>
          ,
          <country country="HR">Croatia</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Learning analytics should go beyond data analysis and include approaches and algorithms that are meaningful for learner performance and that can be interpreted by teacher and related to learning outcomes. Assessment analytics has been lagging behind other research in learning analytics. This holds true especially for peer-assessment analytics. In this paper we present a mathematical model for peerassessment based on the use of scoring rubrics for criteriabased assessment. We propose methods for the calculation of the nal grade along with reliability measurues of peerassessment. Modeling is motivated and driven by the identied peer-assessment scenarios.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        in [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] conclude that the use of scoring rubrics enhances
the reliability of assessments, especially if the rubrics are
analytic, topic-speci c, and complemented with examples
and/or rater training. Otherwise, the scoring rubrics do not
facilitate valid judgment of performance assessments. Besides
this, rubrics have a potential to promote learning and/or
improve instruction.
      </p>
      <p>
        Aim of this paper is to model peer-assessment and to
discuss issues of nal grade calculation and reliability of raters'
judgments. Jonsson and Svingby note that variations in
raters' judgments can occur either across raters, known as
inter-rater reliability, or in the consistency of one single rater,
called intra-rater reliability. Referring to [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] Jonsson and
Svingby state that a major threat to reliability is the lack of
consistency of an individual grader. Reports rarely mention
this measure. On the other hand, inter-rater reliability is in
some form mentioned in more than half of the reports but
many of these simply use percentage as a measure for
agreement. This is in agreement with Sadler and Good's critique
in [
        <xref ref-type="bibr" rid="ref13">14</xref>
        ] of poor quality of quantitative research regarding
self-assessment. Situation has improved since. Nevertheless,
majority of current research still uses overly simple
statistical measures in order to determine correlations that might
indicate reliability.
      </p>
      <p>In the following sections we describe two major peer-assessment
scenarios we have recognized and for which we have developed
a mathematical model. After that we present and analyze a
model for these scenarios.</p>
    </sec>
    <sec id="sec-2">
      <title>2. SCENARIOS FOR PEER-ASSESSMENT</title>
      <p>
        Reliability of peer-assessment depends on many factors but
consistency of individual evaluator was very early recognized
as the most important (see [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]). On the other hand, having
more assessments per assignment increases the reliability of
peer-assessment with relatively inexperienced evaluators.
From experienced evaluators (experts) we presume a high
expertize in the domain knowledge and prior experience
in evaluation. Similarly, an inexperienced evaluator is an
individual with a relatively high level of domain knowledge
(high baseline), but lacking experience in evaluation (e.g. peer
assessment by senior undergraduates).
      </p>
      <p>We analyze scenarios with respect to the experience of
evaluator as is shown in scenario grid (Fig. 1). We have placed
a continuum of possible scenarios in a grid with four
quadrants. Within four quadrants we recognize two interesting
scenarios for peer-assessment and discard the other two as
either unrealistic or inappropriate.</p>
      <p>In the rst scenario, let us call it Scenario A, participants
are inexperienced evaluators (for example undergraduate
students with introductory domain knowledge and no
experience in peer-assessment) whereas in the scenario B evaluators
have higher expertize in the evaluated domain (i.e.
teachers, graduate students or senior undergraduates) and prior
training in assessment. In scenario A, the lack of experience
in evaluation must be compensated with a quantity of
peerassessments, i.e. having a larger group size in peer-assessment.
On the other hand, setting a group size too large in scenario
B is a needless waste of expert's time.
inexperienced
evaluators
(Scenario A)
experienced evaluators
(Scenario B)</p>
      <p>t appropriate
no
small
size of a peer assessment group
large</p>
    </sec>
    <sec id="sec-3">
      <title>3. OVERVIEW OF THE PEER-ASSESSMENT</title>
    </sec>
    <sec id="sec-4">
      <title>ACTIVITY</title>
      <p>Peer-assessment activity starts after the work on the
assignment task has completed. In a general case peer-assessment
consists of two phases. We identify following activities in the
whole process.</p>
      <sec id="sec-4-1">
        <title>Phase 1: Assessment of assignments</title>
        <p>i. Learners assess a (prede ned) number of assigned
assignments
ii. Analysis of peer-assessments</p>
        <p>(grouped by assignment)
iii. Calculation of the assignment grade</p>
      </sec>
      <sec id="sec-4-2">
        <title>Phase 2: Assessment of the assessments</title>
        <p>i. Analysis of peer-assessments</p>
        <p>(grouped by grader)
ii. Calculation of the assessment grade
First phase starts with learners assessing the assignment
work of their peers. We assume that each participant grades
several assignments (at least 2). At the end of the rst phase
a reliability check has to be performed and the nal grade has
to be calculated. Second phase is concerned with the quality
of assessments relative to the evaluator. As on outcome of
the second phase graders can receive a grade (points) for the
quality of their assessments.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>4. MATHEMATICAL MODEL FOR PEER</title>
    </sec>
    <sec id="sec-6">
      <title>ASSESSMENT</title>
      <p>We recognized three challenges: (1) calculation of the nal
grade based on di erent assessment scenarios, (2)
measurement of the assessment's reliability and (3) measurement of
reliability of each grader (for grading of the graders).</p>
    </sec>
    <sec id="sec-7">
      <title>4.1 Overview of the assignment grading</title>
      <p>
        A grading G from the scoring rubric with n criteria is a
tuple of numbers G = (g1; : : : ; gn). We consider gradings as
points in an n-dimensional space endowed with a metric d,
i.e. a function that measures the distance between points
(i.e. gradings) and satis es the axioms of a metric space.
In [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] we proposed the use of the non-euclidean taxicab metric
d1, but for the purpose of this paper it is su cient think of
d as any distance metric.
      </p>
    </sec>
    <sec id="sec-8">
      <title>4.2 Calculation of the assignment’s final grade</title>
      <p>An assignment graded through peer-assessment will receive
several peer gradings. These will have to be analyzed. If
estimated as reliable these gradings will be use as input for
the calculation of the nal grade.</p>
      <p>A simplest approach is to calculate the nal grade of
assignment as the mean value of received assessments.</p>
      <p>Let S = fSk1; : : : ; Skmg denote a set of peer gradings for
assignment k, then the mean grade is</p>
      <p>M (S) = (af1 ; : : : ; afn);
where aif =
1
m
m
X c(kj;)i
j=1
!
M (S) is a center of mass of the set S. This method for
grade calculation is suitable for scenario A. We can say that
M (S) is sensitive to quantity, and less sensitive to outliers
(it \respects the decision of the majority ").</p>
      <p>
        For scenario B, we propose an alternative grade calculation
method (see [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]). In scenario B we assume that peers are
experienced evaluators. Final grade is calculated as so-called
optimal nal grade O(S) de ned by
      </p>
      <p>O(S) = (of1 ; : : : ; ofn);
where oif =</p>
      <p>
        W (S) + B(S) :
W (S) and B(S) represent amalgamations of worst and best
received gradings respectively, de ned by:
This approach is inspired by Hwang and Yoon's TOPSIS
(Technique for Order of Preference by Similarity to Ideal
Solution) method of multi-criteria decision making in [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
When evaluators are trusted experts, we don't expect \wild"
gradings (outliers). Here, it is expected that after just a few
initial evaluations any additional gradings will have no e ect
on the nal grade O(S). Please consult [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] for additional
details.
      </p>
      <p>A summary of our recommendations for two scenarios A and
B is given in the Table 2.</p>
    </sec>
    <sec id="sec-9">
      <title>4.3 Reliability of the peer-assessment</title>
      <p>A prerequisite for the calculation of the assignment's nal
grade is the determination whether a received set of
peerassessments is (su ciently) reliable, i.e. acceptable.
For reasoning about reliability it is necessary to have granular
data. The importance of granular scoring data is illustrated
in the example in Table 3. Gradings S1 and S2 agree on the
summative level, but seem very distinct at the granular level.
This is an example of an unreliable peer-grading set where
this incoherence is not visible on the summative level.
We consider a set S of peer gradings as reliable if diam S
(maximal pairwise distance between gradings) is less than 2e
where e is acceptable error given in advance.</p>
      <p>Note that the diameter of the set S is also a diameter of
an encompassing sphere. So, we can say that a reliable
peer-grading set ts within an encompassing e-sphere.
If a set of peer-assessments is estimated as not acceptable
(un-reliable) on the granulated level then the nal grade
cannot be calculated. A recommendation about acceptability
of particular peer-assessment set can be given to teacher
or course designer by LA. This can be implemented in the
learning management system (LMS, for example Moodle).
Practical related issues will be discussed in the section 5.</p>
    </sec>
    <sec id="sec-10">
      <title>4.4 Grading process</title>
      <p>Assessment set can turn out as unacceptable because of a
single outlier grading. As an attempt to eliminate the outlier
grading we propose to search for a maximal acceptable subset
of the received peer-assessments. If such subset can be found,
it is then used as input for the nal grade calculation.
As a measure of nal resort, an supervisor's intervention
is asked for. In a course with a large student enrollment
(thousands for a MOOC) this will be avoided as much as
possible. However, if present, instructor's assessment becomes
a nal grade (no need for calculation). This is described in
Algorithm 1.</p>
    </sec>
    <sec id="sec-11">
      <title>4.5 Normalization</title>
      <p>Metric d can be linearly scaled to obtain a normalized metric
d0 with values within the interval [0; 1]. Distance of d0 = 1
corresponds to the maximal distance between worst and best
possible gradings.</p>
      <p>This would facilitate having general recommendations for
setting acceptable error e on a normalized scale (setting
Algorithm 1: Semi-autonomous Grading Process
input : Set of gradings S = fS(1); : : : ; S(m)g,
acceptable error e 0
grading calculation method g
critical size N (i.e. N = 3)
output : Final grade or indicate gradings S as invalid
1 nd a maximal S0 S with acceptable error
2 if #(S0) N then
3 nd S00 of size #(S00) = #(S0) of minimal diameter
4 return g(S00) as a proper grade for assignment k
else
5</p>
      <p>Ask for teacher intervention (grading)
e0 = 0:2 for example). Additionally, this could facilitate
comparison of data from di erent tasks (within a course, or
from di erent courses).</p>
    </sec>
    <sec id="sec-12">
      <title>4.6 Evaluation of peer-assessments (awarding the graders)</title>
      <p>Goal of the second phase of the peer-assessment process is
to reward the graders for their e ort. Graders (peers) who
have graded consistently and accurately (near the nal grade)
should be rewarded more than inconsistent and inaccurate
graders.</p>
      <p>Let us assume that a maximum of A points is awarded for
the peer-assessment task. Then grader k can be awarded
Ai points for each of the m gradings Gi that he/she was
assigned, where Ai is calculated by the following formula
Ai(di) :=
8 A
&lt; me</p>
    </sec>
    <sec id="sec-13">
      <title>5. IMPLEMENTATION</title>
      <p>A support for peer-assessment LA is lacking in assessment
analytics in general. We analyze the current
implementation in the Moodle LMS where peer-assessment activity is
implemented with the Workshop plugin.</p>
      <p>In a Workshop activity, students receive a grade for their
work and another grade for the quality of their assessment
of other student's assignments.</p>
      <p>Each participant in Workshop gets a grade for his submission
and a grade for her assessments. These grades are visible as
separate grade items in student's gradebook.</p>
      <p>
        Current implementation of Workshop calculates the
assignment grade as a weighted mean of received assessment
gradings. Received gradings are not analyzed for reliability. If the
teacher wishes to override or in uence the calculated
assignment grade, he can (a) additional provide his own assessment
and set its corresponding weight to a higher value or (b) even
completely override the nal grade. As we have argued here
and in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] we nd this method as inadequate. Therefore, we
proposed alternative methods for the calculation of the nal
grade.
      </p>
      <p>Assessment grade calculation is more complex. The goal is
to estimate the quality of each assessment. One assessment
is singled out as the best one { it is the assessment closest to
the mean value of all assessments. This selected assessment
is assigned with highest grade. Other assessments receive
grades based on the distance from the selected assessment.
Teacher can in uence in this process by setting the
parameter which determines how quickly a grade should decreased
relative to the distance.</p>
      <p>We are currently developing a new Moodle plugin for
peerassessment. This plugin will address the identi ed problems
of the current implementation according to our model.</p>
    </sec>
    <sec id="sec-14">
      <title>6. CONCLUSION. FURTHER RESEARCH</title>
      <p>Peer-assessment has many advantages for students (for
example development of metacognitive skills) and for teachers
(for example saves teacher's time) but there are several
challenges related to their implementation such as calculation of
nal grade, reliability check and awarding an evaluator for
peer-assessment.</p>
      <p>
        In this paper we propose new methods for calculation of
the grades in peer-assessment. We propose a measure for
reliability and a method for grading peer-evaluations in a
peer-assessment exercise. These metrics are based on two
distinguished scenario analysis that takes into account a
number of possible evaluators and evaluator expertize (domain
knowledge and evaluation skills). We pursue an approach to
model assessment LA analytics with a geometric model.
In [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] we analyzed a case study based on the master level
Project Management course at the University of Zagreb. Our
analysis has con rmed the need for deeper analysis of
reliability in peer-assessment. Further exploring of data related
to the peer-assessment learning analytics in MOOCs is
expected. Having additional data should result in improvement
of the model and recommendations on the applicability of
scenarios, parameters and analysis of the acceptable error of
the assessment set.
      </p>
      <p>Also, we intend to implement our model (algorithms and the
supporting recommendation system) as a peer-assessment
plug-in for the Moodle LMS.</p>
      <p>Finally, we conclude that a well founded mathematical
modeling, based on not just descriptive statistics, should be used
more often in learning analytics.
[11] Moodle LMS (https://moodle.org/)</p>
      <p>Plugins available on January 10th, 2016. at
https://moodle.org/plugins/</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Brown</surname>
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bull</surname>
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pendelbury</surname>
            <given-names>M.</given-names>
          </string-name>
          , \
          <article-title>Assessing Student Learning in Higher Education"</article-title>
          , Psychology Press,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Divjak</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          \
          <article-title>Implementation of Learning Outcomes in Mathematics for Non-Mathematics Major by Using E-Learning"</article-title>
          ,
          <source>in Teaching Mathematics Online: Emergent Technologies and Methodologies</source>
          ,
          <string-name>
            <given-names>A. A.</given-names>
            <surname>Juan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Huertas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Trenholm</surname>
          </string-name>
          , and C. Steegmann, Eds.
          <source>IGI Global</source>
          ,
          <year>2012</year>
          , pp.
          <volume>119</volume>
          {
          <fpage>140</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Divjak</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          \
          <article-title>Assessment of Complex, Non-Structured Mathematical Problems"</article-title>
          , in IMA International Conference on Barriers and Enablers to Learning Maths,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Divjak</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Maretic</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          \
          <article-title>Learning Analytics for e-Assessment: The State of the Art and One Case Study "</article-title>
          ,
          <source>CECIIS</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Divjak</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Maretic</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          \
          <article-title>Geometry for Learning Analytics"</article-title>
          ,
          <source>Scienti c and Professional Information Journal of Croatian Society for Constructive Geometry and Computer Graphics</source>
          , KoG
          <volume>19</volume>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Ellis</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          \
          <article-title>Broadening the scope and increasing the usefulness of learning analytics: The case for assessment analytics",</article-title>
          <string-name>
            <given-names>Br. J.</given-names>
            <surname>Educ</surname>
          </string-name>
          . Technol., vol.
          <volume>44</volume>
          , no.
          <issue>4</issue>
          , pp.
          <volume>662</volume>
          {
          <issue>664</issue>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Entwistle</surname>
            ,
            <given-names>N. J.</given-names>
          </string-name>
          \
          <article-title>Teaching for understanding at university: deep approaches and distinctive ways of thinking"</article-title>
          . Basingstoke, Hampshire: Palgrave Macmillan,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Ferguson</surname>
            ,
            <given-names>R. \</given-names>
          </string-name>
          <article-title>The state of learning analytics in 2012: a review and future challenges"</article-title>
          ,
          <source>Tech. Rep. KMI-12-01</source>
          , vol.
          <volume>4</volume>
          , no.
          <source>March</source>
          , p.
          <fpage>18</fpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Hwang</surname>
            ,
            <given-names>C.L</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yoon</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,\
          <article-title>Multiple Attribute Decision Making and Applications"</article-title>
          , NY, Springer Verlag,
          <year>1981</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Jonnson</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Svigby</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          , \
          <article-title>The use of scoring rubrics: Reliability, validity and educational consequences "</article-title>
          ,
          <source>Educational Research Review</source>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Papamitsiou</surname>
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Economides</surname>
            <given-names>A.A.</given-names>
          </string-name>
          ,
          <article-title>\Learning Analytics and Educational Data Mining in Practice", A Systematic Literature Review of Empirical Evidence</article-title>
          ,
          <source>Educational Technology &amp; Societym</source>
          <volume>17</volume>
          (
          <issue>5</issue>
          ),
          <fpage>49</fpage>
          -
          <lpage>64</lpage>
          .,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Reyes</surname>
            <given-names>Jacqueleen A.</given-names>
          </string-name>
          , \
          <article-title>The Skinny on Big Data in Education: Learning Analytics Simpli ed"</article-title>
          ,
          <source>TechTrends: Linking Research and Practice to Improve Learning</source>
          <volume>59</volume>
          (April):
          <volume>75</volume>
          {
          <fpage>80</fpage>
          .
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Sadler</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Good</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          , \
          <article-title>The impact of self-and peer grading on student learning "</article-title>
          , Educ. Assess., vol.
          <volume>11</volume>
          , no.
          <issue>1</issue>
          , pp.
          <volume>37</volume>
          {
          <issue>41</issue>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>