Personalised Automated Assessments

                         Patricia Gutierrez and Nardine Osman and Carles Sierra
                     Artificial Intelligence Research Institute (IIIA-CSIC), Barcelona, Spain
                                        {patricia, nardine, sierra}@iiia.csic.es


                          Abstract                                      Walsh, 2014; Wu et al., 2015] is that the computed peer-based
                                                                        assessment is tuned to the perspective of a specific commu-
     Consider an evaluator, or an assessor, who needs to                nity member. PAAS aggregates peer assessments giving more
     assess a large amount of information. For instance,                weight to those peers that are trusted by the specific commu-
     think of a tutor in a massive open online course with              nity member whom the automated assessments are computed
     thousands of enrolled students, a senior program                   for. How much this specific member trusts a peer is then
     committee member in a large peer review process                    based on the similarity or evaluation rate between his (past)
     who needs to decide what are the final marks of                    assessments and the peer’s (past) assessments over the same
     reviewed papers, or a user in an e-commerce sce-                   assignments. To compute such a trust measure, we build a
     nario where the user needs to build up its opinion                 trust network conformed of direct and indirect trust values
     about products evaluated by others. When assess-                   among community members. Direct trust values are derived
     ing a large number of objects, sometimes it is sim-                from common assessments while indirect trust is based in the
     ply unfeasible to evaluate them all and often one                  notion of transitivity. We clarify that our target is not consen-
     may need to rely on the opinions of others. In this                sus building, but to accurately estimate unknown assessments
     paper we provide a model that uses peer assess-                    from a specific member’s point of view, based on the peers’
     ments to generate expected assessments and tune                    assessments and reliability.
     them for a particular assessor. Furthermore, we are
                                                                           Finally, we are also able to provide a measure of the un-
     able to provide a measure of the uncertainty of our
                                                                        certainty of our calculated assessments and a ranking of the
     computed assessments and a ranking of the objects
                                                                        objects that should be assessed next in order to decrease the
     that should be assessed next in order to decrease the
                                                                        overall uncertainty of those calculated assessments.
     overall uncertainty of the calculated assessments.

1   Introduction                                                        2     The PAAS Model
Consider an assessor who needs to assess a large amount of              2.1    Notation and Problem Definition
information. For instance, think of a tutor in a massive open
online course with thousands of enrolled students, a senior             Let  represent an assessor who needs to assess a large set of
program committee member in a large peer review process                 objects I, and let P be a set of peers that are able to assess
who needs to decide what are the final marks of reviewed                objects in I.
papers, or a user in an e-commerce scenario where the user                 We understand assessments as probability distributions
needs to build up its opinion about products evaluated by oth-          over an evaluation space E at a given moment in time. For
ers. When assessing a large number of objects, sometimes it             example, one can define a set of elements for the evalua-
is simply unfeasible to evaluate them all and often one may             tion space for the quality of an English classroom homework
need to rely on the opinions of others. In the process of build-        as E = {poor, good, excellent}. The assessment {poor 7→
ing up our opinion, some questions need to be answered, such            0, good 7→ 0, excellent 7→ 1} would represent the high-
as: How much should I trust the opinion of a peer? What                 est assessment possible, whereas the assessment {poor 7→
should I believe given a peer’s opinion? What should I be-              0, good 7→ 1/2, excellent 7→ 1/2} would represent that the
lieve when many peers give their different opinions? Which              quality of the homework is most probably between good and
objects should be assessed next, such that the certainty of my          excellent, and so on.
belief improves?                                                           We define an assessment eα     i (also referred to as evaluation
   This paper addresses these questions through the Person-             or opinion) as a probability distribution over the evaluation
alised Automated ASsessment model (PAAS). PAAS uses                     space E, where α ∈ I is the object being evaluated and i ∈
peer assessment to calculate and predict assessments. How-              { ∪ P} is the evaluator. We say eα     i ={x1 7→ v1 , . . . , xn 7→
ever, what is fundamentally different from many previous                vn }, where {x1 , . . . , xn } = E and vi ∈ [0, 1] represents the
works [Piech et al., 2013; de Alfaro and Shavlovsky, 2013;              value assigned to each element xi ∈ E, with the condition


                                                                   40
       X
that           vi=1.                                                     2.2    Step 1. How much should I trust a peer?
       i∈|E|                                                              needs to decide how much can he or she trust the assessment
   Finally, we define L as the history of all assessments per-           of a peer µ. We define this trust measure based on the follow-
formed, and Oα ⊂ L as the set of past peer assessments over              ing two intuitions. Our first intuition states that if  and µ have
the object α.                                                            both assessed the same object, then the similarity of their as-
   The ultimate goal of our work is to compute the probability           sessments can give a hint of how close their judgments are.
distribution of ’s evaluation over a certain object α, given the        However, cases may arise where there are simply no objects
evaluations of several peers over that same object α. In other           evaluated by both  and µ. In such a case, one may think of
words, what is the probability that ’s evaluation is x given            simply neglecting µ’s assessment, as  would not know how
the set of peers’ evaluations Oα ? Such expectation can be               much to trust µ’s assessment. Our second intuition, however,
formalized with the conditional probability as follows:                  proposes an alternative approach for such cases, where we ap-
                                                                         proximate that unknown trust between  and µ by looking into
                                                                         a chain of trust between  and µ through other peers. Roughly
                             p(X=x | Oα )                                speaking, we relay on the transitive notion: “if  trusts µ, and
.                                                                        µ trusts µ0 , then  will likely trust µ0 ”. In the following, we
                                                                         define these two intuitions through two different types of trust
   To calculate the above conditional probability, we take into
                                                                         relations: direct trust and indirect trust.
account every particular evaluation in Oα . In other words,
expectations (or probabilities) are calculated for each indi-            Direct Trust
vidual evaluation in Oα , before those expectations are aggre-           Direct trust is the trust relation that emerges between evalua-
gated into p(X=x | Oα ). The probability that ’s assessment             tors that have assessed one or more objects in common. One
is x given a particular evaluation eα
                                    µ ∈ Oα is formalized as              possible approach is to measure such relation as aggregations
follows:                                                                 of their evaluations’ similarity over those objects assessed in
                                                                         common. For instance, let the set Ai,j ={α | eα          α
                                                                                                                             i , ej ∈ L}
                             p(X=x | eα                                  be the set of objects that have been assessed by both evalu-
                                      µ)
                                                                         ators i and j. Then different definitions for the direct trust
.                                                                        between i and j based on the similarity between two assess-
   The more general probability p(X=x | Oα ) is then defined             ments (sim(eα       α
                                                                                        j , ej )) may be adopted, such as as:
as an aggregation of the individual probabilities:                         • The average of the similarities for all commonly as-
                                                                             sessed objects:
                       p(X=x | Oα )=p(X=x | eα
                                             µ)                                                    X
                                                                                                          sim(eα     α
                                                                                                                i , ej )
where the exact definition of the aggregation is presented later                                         α∈Ai,j
on in Section 2.4.                                                                          TD (i, j)=
                                                                                                                  |Ai,j |
   We strongly base the intuition behind the computation of
the individual conditional probabilities on the notion of trust            • The conjunction of the similarities for all commonly as-
between peers based on previous experiences, where trust is                  sessed objects:
understood in this context as the expected similarity between                                         ^
the assessments given by those peers. In other words, our in-                             TD (i, j)=       sim(eα      α
                                                                                                                  i , ej )
                                                                                                         α∈Ai,j
tuition is that we expect  will tend to agree with µ’s assess-
ments if his trust on µ is high. Otherwise, ’s evaluation will            • The Pearson coefficient [Upton and Cook, 2008], or lin-
probably be different. We perform then a sort of analogical                  ear correlation between i and j, for all commonly as-
reasoning: if in the past µ gave opinions that were a certain                sessed objects:
degree dissimilar from ’s opinions, then this will probably                                   X
happen again now.                                                                                   sim(eα                 α
                                                                                                          i , e¯i ) · sim(ej , e¯j )
   The remainder of this section is divided accordingly. We                                     α∈Ai,j
first describe in detail how the measure of trust between peers                TD (i, j)= s X                          s X
is calculated (Section 2.2). Then, we illustrate how to cal-                                         sim(eα
                                                                                                          i , e¯i )
                                                                                                                   2
                                                                                                                                 sim(eα
                                                                                                                                      j , e¯j )
                                                                                                                                               2

culate ’s assessment on an object α given µ’s assessment                                   α∈Ai,j                      α∈Ai,j
over α and ’s trust in µ’s assessments (Section 2.3). In other
words, we present an approach for calculating the individual                   where e¯i , e¯j are the means of the evaluations performed
probability p(X=x | eα                                                         over the set Ai,j by i and j respectively.
                          µ ). We then illustrate how to combine
those probabilities to build the probability distribution of ’s            However when we calculate such aggregations we loose
assessments given the assessments of several peers (Section              relevant information. For instance, we are not able to tell if j
2.4). In other words, we present an approach for calculating             usually under rates with respect to i, if it usually over rates,
the probability p(X=x | Oα ). Finally, we provide a measure              or neither. We are also not able to tell if the dissimilarities
of the uncertainty of the computed assessments and a ranking             between i and j’s evaluations are highly variable or not.
of the objects that should be assessed next by  in order to                To cope with such loss of information, we define the direct
decrease that uncertainty (Section 2.5).                                 trust between two peers i and j as a probability distribution


                                                                    41
TDi,j : [0, 1] → [0, 1] built from the historical data of previ-              In what follows, we explain how we build direct trust dis-
ous evaluations performed by i and j. This probability dis-                tributions computationally, based on previous experiences.
tribution describes, as we will explain shortly, the expected                 Initially, the direct trust distribution between any two peers
similarity or the expected evaluation rate between i and j’s               is the uniform distribution F={1/n, . . . , 1/n} (describing ig-
assessments. The support of the distribution is [0, 1] since               norance), where n is the size of the distribution’s support.
both the expected similarity and the expected evaluation rate              Every new assessment made would then update the trust dis-
are in the range [0, 1], as we will see shortly, and the range             tributions accordingly. Consider a new assessment eα      i . The
of the distribution is [0, 1] as this is a probability distribution        distribution TDi,j ∀j s.t. Ai,j 6= ∅ is updated as follows:
and the range of any probability is [0, 1]. Note that we do not
                                                                            1. We find the element x in TDi,j ’s support whose probabil-
consider here any summarizing measure for trust that would
                                                                               ity needs to be adjusted. So we calculate x=sim(eα       α
                                                                                                                                   j , ei )
translate that distribution into a single value, although a num-
                                                                               in the ordered case (where the definition of sim is do-
ber of measures could be used, such as the average similarity
                                                                               main dependent and outside the scope of this paper,
(as the center of gravity of the distribution) or entropy (as a
                                                                               although we do note that several approaches may be
measure of the uncertainty of the distribution).
                                                                               adopted, such as using semantic similarity measures [Li
   When defining TDi,j we distinguish two cases: (1) a                                                    α
                                                                               et al., 2003]), or x = r(ej /eαi ) in the non-ordered case
first case with a non-ordered evaluation space, such as E =
                                                                               (Equation 1).
{visionary, original, sound}; and (2) a second case with an
ordered evaluation space, such as={bad, good, excellent}. In                2. We update the probability of the single expectation x in
the second case, we are interested in maintaining information                  TDi,j accordingly:
about whether a peer under rates or over rates with respect
                                                                                        p(X=x) = p(X=x) + γ · (1 − p(X=x))              (2)
to another peer, therefore we are interested in the expected
evaluation rate between i and j. In the first case, this is not                 The update is based on increasing the latest probability
an issue as assessments cannot be ordered and therefore the                     p(X =x) by a fraction γ ∈ [0, 1] of the total potential
notion of under/over rating does not exist, therefore we are                    increase (1 − p(X =x)). For instance, if the probabil-
rather interested in the expected similarity between i and j’s                  ity of x is 0.6 and γ is 0.1, then the new probability of
assessments. Next we detail the trust probability distributions                 x becomes 0.6 + 0.1 · (1 − 0.6) = 0.64. We note that
TDi,j built for both cases.                                                     the ideal value of γ should be closer to 0 than to 1 so
  • Non-Ordered Case.                                                           that one single experience does not result in consider-
    In the non-ordered case, we are interested in the similar-                  able changes in the distribution. In other words, a single
    ity between i and j’s assessments. As such, the support                     assessment cannot result in considerable change in the
    of the distribution representing i’s direct trust on j (i.e.                probability distribution. Considerable changes can only
    the x-axis of TDi,j ) consists of the possible degrees of                   be the result of information learned from the accumula-
    similarity between i and j’s assessments.                                   tion of many assessments.
    Trust distribution TDi,j (x) then describes the probability             3. We normalize TDi,j by updating several expectations
    that peers i and j evaluate an object with a similarity x                  following the entropy based approach of [Sierra and
    (or the probability that the similarity of their evaluations               Debenham, 2005]. The entropy-based approach updates
    is x).                                                                     TDi,j such that: (1) the value p(X=x) is maintained and
                                                                               (2) the resulting distribution has a minimal relative en-
  • Ordered Case.                                                              tropy with respect to the previous one. In other words,
    In the ordered case, we are interested in the evaluation                   we look for a distribution that contains the updated prob-
    rate ej/ei between evaluations made by peers i and j.                      ability value p(X =x) and that is at a minimal distance
    If ej/ei = 1, this means that i and j provide the same                     from the original TDi,j (as the relative entropy is a mea-
    evaluation. If ej/ei > 1, this meas that j over rates with                 sure of the difference between two probability distribu-
    respect to i. If ej/ei < 1, this means that j under rates                  tions). Following this approach, we update TDi,j (X) as
    with respect to i.                                                         follows:
    We normalize the evaluation rate to values between 0                                                  X                  p(X=x0 )
    and 1. To do so, we require a non decreasing function                          TDi,j (X) = arg min         p(X=x0 ) log 0
    r : R → [0, 1] such that limx→∞ r(x)=1, and for conve-                                         P0 (X)                   p (X=x0 )
                                                                                                           x0
    nience we constraint r(1)=0.5. We adopt the following                                         such that {p(X=x) = p0 (X=x)}
    normalized evaluation rate function that satisfies these                                                                          (3)
    properties:                                                                where p(X =x0 ) is a probability value in TDi,j , p0 (X =
                           r(x)=e
                                   ln 1/2/x
                                                             (1)               x0 ) is a probability value in P0 , and {p(X=x) = p0 (X=
                                                                               x)} specifies the constraint that needs to be satisfied by
     As such, the support of the distribution representing i’s                 the resulting distribution.
     direct trust on j (i.e. the x-axis of TDi,j ) consists of the
     possible normalized evaluation rates between i and j.                 Indirect Trust
     Trust distribution TDi,j (x) then describes the probability           Given a direct trust relation between peers i and j and be-
     that i and j would assess an object with a normalized                 tween peers j and k, the question now is: What can we say
     evaluation rate x.                                                    about the indirect trust between peers i and k when i and k


                                                                      42
have no objects assessed in common? In other words, given                    Again, this could result in more than one probability
the direct trust distributions TDi,j and TDj,k , what can we say             computed for the same expectation xik . As such, we
about the indirect trust distribution TIi,k ?                                then add up all the probabilities that correspond to the
   As with direct trust distributions, we distinguish two cases:             same expectation xik .
a first case where assessments cannot be ordered and thus           The calculations presented above provide an approach for
trust is based on a similarity measure sim; and a second case    calculating indirect trust between two peers i and k when
where assessments can be ordered and thus trust   1
                                                     is based on those peers are linked through a direct trust chain passing
a normalized evaluation rate function r(x)=eln /2/x .            through only one intermediate peer j. For direct trust chains
   • Non-Ordered Case.                                           of increasing length between i and k, the previous process
      In this case, we want to preserve the fundamental tri-     is iterated. For instance, if there is a direct trust chain link-
      angular inequality property of similarity functions that   ing i to j, j to m, and m to k, then we first compute the
      says that: T-norm(sim(a, b), sim(b, c)) ≤ sim(a, c).       indirect trust distribution TIi,m from the direct trust distribu-
      As with TDi,k , the support (or the x-axis) of TIi,k con-  tions TDi,j and TDj,m , and then we compute the indirect trust
      sists of the possible degrees of similarity between i and  distribution TIi,k from the direct/indirect trust distributions
      k’s assessments. But since these degrees of similarity     TIi,m and TDm,k , following the same approach as above.
      should satisfy the T-norm, the support is defined as the      When multiple chains of direct trust connect two peers (e.g.
      set:                                                       say a chain linking i to j and j to k, and another chain linking
        supp(TIi,k )={xik=T-norm(xij , xjk ) | xij ∈ supp(TDi,j )i to m and m to k), we obtain multiple indirect trust distribu-
                                         ∧xjk ∈ supp(TDj,k )}    tions (one from every chain). In those cases, we pick the re-
                                                                 sulting distribution which is most optimistic. In other words,
    where supp represents the support of a distribution.         while our approach to calculate the indirect trust follows the
    We then compute the probabilities of the expectations of     pessimistic approach (through our choice of the product oper-
    TIi,k as follows:                                            ator in Equations 4 and 5), we now choose the most optimistic
      {p(X=xik=T-norm(xij , xjk ))=TDi,j (xij ) ∗ TDj,k (xjk ) | of the pessimistic outcomes. To do that, we choose the distri-
                     xij ∈ supp(TDi,j ) ∧ xjk ∈ supp(TDj,k )} bution that is closest to the equivalence distribution, which is
                                                                 a distribution that describes that the evaluations of two peers
                                                            (4)
                                                                 are equivalent. In the non-ordered case, the equivalence dis-
    This could result in more than one probability computed
                                                                 tribution is PE (1)=1; that is, the similarity between two peers
    for the same expectation xik . As such, we then add up all
                                                                 is maximum. In the non-ordered case, the equivalence dis-
    the probabilities that correspond to the same expectation
                                                                 tribution is PE (0.5) = 1; that is, the normalized evaluation
    xik .
                                                                 rate between two peers is 0.5, which implies that they always
    We note that we follow a conservative approach by            provide the same evaluation. The distance between an indi-
    adopting the product operator (Equation 4), which is a       rect trust distribution TIi,k and the equivalence distribution
    T-norm that gives the smallest possible values, as we        PE can be calculated as:
    prefer not to overrate indirect trust values since they are
    not inferred directly from historical data. Of course,                                   emd(TIi,k , PE )                         (6)
    other operators could also be used, such as the min func-
    tion.                                                        where emd is the earth mover’s distance which calculates the
                                                                 distance between two probability distributions [Rubner et al.,
  • Ordered Case.                                                1998].1 We note that the range of emd is [0,1], where 0 rep-
    In this case, we want to preserve the property: ej/ei ∗      resents the minimum distance and 1 represents the maximum
    ek/ej =ek/ei with respect to the evaluations performed by
                                                                 possible distance.
    i, j and k. For instance, if the evaluation rate between        In the remainder of this paper, when we refer explicitly to
    ej and ei is 0.5 (j under rates a 50% with respect to i)     a direct or indirect trust distribution between peers i and j,
    and the evaluation rate between ek and ej is 0.5 (k under    we refer to such distribution as TDi,j or TIi,j , respectively.
    rates a 50 % with respect to j) then the evaluation rate     Whereas when we refer generically to a trust distribution that
    between ek and ei should be 0.25 (then k under rates a       could either be the direct or indirect trust distribution, we re-
    75 % with respect to i).                                     fer to such a distribution as Ti,j .
    As above, the support (or the x-axis) of TIi,k consists
    of the possible degrees of similarity between i and k’s      Trust Graph
    assessments. The support us then defined as the set:         Direct and indirect trust relations in a community can be rep-
                                                                 resented by a weighted directed graph. We define a commu-
      supp(TIi,k ) = {xik=xij ∗ xjk | xij ∈ supp(TDi,j )         nity’s trust graph as:
                                       ∧xjk ∈ supp(TDj,k )}
                                                                                              G=hN, E, wi
    We then compute the probabilities of the expectations of
                                                                     1
    TIi,k as follows:                                                  If probability distributions are viewed as piles of dirt, then the
                                                                        earth mover’s distance measures the minimum cost for transforming
      {p(X=xik=xij ∗ xjk ) = TDi,j (xij ) ∗ TDj,k (xjk ) |              one pile into the other. This cost is equivalent to the ‘amount of dirt’
                  xij ∈ supp(TDi,j ) ∧ xjk ∈ supp(TDj,k )}              times the distance by which it is moved, or the distance between
                                                           (5)          elements of the probability distribution’s support.


                                                                   43
where the set of nodes N is the set of evaluators in { ∪ P},                where Λ is the decay function satisfying the property:
                                                                                         0
E ⊆ N × N are edges between evaluators with direct or                         lim Tti,j t = D. One possible definition for Λ could be:
indirect trust relations, and w : E 7→ [0, 1]n is the weight of              0
                                                                             t →∞
an edge, described as a trust probability distribution.                                        0
                                                                                         Tti,j t = ν ∆t,t0 · Tti,j + (1 − ν ∆t,t0 )D         (8)
   D ⊂ E is the set of edges that link evaluators with direct
trust relations: D = {(i, j) ∈ E | TDi,j 6= ⊥}. Similarly,                   where ν is the decay rate, and:
I ⊂ E is the set of edges that connect evaluators with indirect                                  
trust relations: I = {(i, j) ∈ E | TIi,j 6= ⊥} \ D. We note                                      0                 , if t0 − t < ω
that the set of edges E is then composed of the union of the                             ∆t,t0 =         t0 − t
                                                                                                 1 +               , otherwise
set of direct and indirect edges: E = D ∪ I. Weights in w                                                tmax
describe direct and indirect trust probability distributions and
                                                                                The definition of ∆t,t0 above serves the purpose of estab-
are defined as follows:
                                                                             lishing a minimum grace period, determined by the parameter
                          
                            TDi,j , if (i, j) ∈ D                            ω, during which the information does not decay, and that once
               w(i, j) =                                                     reached the information starts decaying. The parameter tmax ,
                            TIi,j , if (i, j) ∈ I                            which may be defined in terms of multiples of ω, controls the
   Our goal is to determine how much a particular evaluator                  pace of decay. The main idea behind this is that after the
 can trust a peer µ. So the trust graph is constructed with                 grace period, the decay happens very slowly; in other words,
respect to ’s point of view only. Therefore, we maintain a                  ∆t,t0 decreases very slowly.
trust graph of the whole community containing all the direct
edges between peers (as they are needed to calculate indirect                2.3    Step 2: What to belief when a peer gives an
trust relations), but we only maintain the indirect edges that                      opinion?
connect  with the rest of the peers.                                        Given a peer assessment eα µ , the question now is how to com-
Information Decay                                                            pute the probability distribution of ’s evaluation. In other
An important notion in our proposal is the notion of the decay               words, what is the probability that ’s evaluation of α is x
of information. We say the integrity of information decreases                given that µ evaluated α with eα µ . As illustrated earlier, this is
with time. In other words, the information provided by a trust               expressed as the conditional probability:
probability distribution should lose its value over time and                                           P(X=x | eα
                                                                                                                µ)
decay towards a default value. We refer to this default value
as the decay limit distribution D. For instance, D may be the                   To calculate this conditional probability, the intuition is
uniform distribution, which describes that trust information                 that  would tend to agree with µ’s evaluation if his trust on
learned from past experiences tends to ignorance over time.                  µ (that is, the expected similarity between their assessments
   To implement such a decay mechanism, we need to:                          or the expected evaluation rate between their assessments) is
  1. Record all evaluations eα                                               high. Otherwise, ’s evaluation would probably be different.
                                  i ∈ L made at time t with a
                              αt                                             We perform then a sort of analogical reasoning: if in the past
      timestamp t, noted ei .
                                                                             µ gave assessments that were a certain degree dissimilar from
  2. Record all direct trust distributions TDi,j with a times-               ’s opinions, or with a certain evaluation rate with respect to
      tamp t, noted TD ti,j , where t is the timestamp of the last           , then this will probably happen again now.
      evaluation that modified the trust distribution. The first                We then calculate the above conditional probability based
      time TDi,j is calculated, t is the timestamp of the latest             on the following desired properties:
      evaluation amongst the two evaluations leading to this                    • If T,µ is a flat distribution (i.e. a distribution represent-
      calculation. (Recall that it is the similarity between two                   ing ignorance), then P(X | eα      µ ) should also be a flat
      evaluations or the evaluation rate that updates the prob-                    distribution. That is, the closer ’s trust on µ is to igno-
      ability distribution.) Then, every time a new evaluation                     rance, the less information µ is giving to  with his/her
      with timestamp t0 > t is considered to update TD ti,j ,                      assessment.
      TD ti,j is first decayed from t to t0 before the distribution
                                                                                • The degree of belief eα     = x should increase for those
      is updated.
                                                                                   points x whose similarity (or evaluation rate, in the case
  3. Record all indirect trust distributions TIi,j with a times-                   of the ordered case) to eα µ is high (i.e. for higher values
      tamp t, noted TI ti,j , where t is the time the distribution is              of T,µ ).
      calculated. Every time TIi,j is calculated, all probability               • The degree of belief eα    = x should decrease for those
      distributions involved in this calculation will first need                   points x whose similarity (or evaluation rate, in the case
      to be decayed to the time of calculation t. The time of                      of the ordered case) to eα  µ is low trust (i.e. for lower
      calculation is usually the latest timestamp amongst the                      values of T,µ ).
      timestamps of the distributions involved in this calcula-
      tion.                                                                     Formally, these properties are achieved by defining the
                                                                             probabilities accordingly (where the denominator of the fol-
   Information in a trust probability distribution Ti,j decays               lowing two equations, Equations 9 and 10, is used for normal-
from t to t0 (where t0 > t) as follows:                                      isation to ensure that the resulting distribution is a probability
                             0
                        Tti,j t = Λ(D, Tti,j )                    (7)        distribution):


                                                                        44
  • Non-Ordered Case.                                                        Finally, to translate the final assessment from a probability
                                                                          distribution P(X | Oα ) into a single value, we calculate the
                                              α                           mean (average) of the distribution and select the closest mark
                             eT,µ (sim(eµ ,x))·I(T,µ )                  to that mean.
         p(X=x | eα
                  µ) =      X               α 0                (9)
                               eT,µ (sim(eµ ,x ))·I(T,µ )
                           x0 ∈E
                                                                          2.5    Step 4: What should be evaluated next?
                                                                          The previous three steps have provided a model to calcu-
  • Ordered Case.                                                         late automated assessments of objects that have not been as-
                                                                          sessed by , based on peers opinions. The level of uncer-
                                          α                               tainty of the automated assessments generated by our model
                             eT,µ (r(eµ /x))·I(T,µ )                    can be calculated as the uncertainty of the probability distri-
          p(X=x | eα
                   µ) =     X             α   0               (10)
                               eT,µ (r(eµ /x ))·I(T,µ )                 bution of ’s expected evaluation based on those peers opin-
                            x0 ∈E                                         ions P(X | Oα ). This level of uncertainty is measured by the
                                                                          distribution’s entropy:
where I(T,µ ) is a measure of how informative the probability
distribution T,µ is. We calculate I(T,µ ) as:                                                   H(P(X | Oα ))
                                                                             The question that naturally arises then is what objects can
                   I(T,µ ) = 1 − H(T,µ )                 (11)           be assessed next by  to decrease such uncertainties? For ex-
where H describes the entropy of a probability distribution.              ample, how many more assignments should a tutor evaluate
In other words, the lower the entropy of the distributions then           so that the uncertainty of the calculated assessments becomes
the more informative it is, and vice versa.                               acceptable. We suggest  to evaluate objects with maximum
   We finally define the probability distribution of ’s ex-              uncertainty, or maximum entropic value. The ranking of ob-
pected evaluation given µ’s opinion accordingly: P(X | eα   µ ),          jects with respect to their entropic value is then defined as
where X varies over the evaluation space E.                               follows:
2.4   Step 3: What to belief when many give                                   Rank(α)    = 1 − H(P(X
                                                                                               X     | Oα ))
      opinions?                                                                          =1+      p(X=x | Oα ) ln p(x | Oα )         (13)
                                                                                                 x∈X
In the previous section we computed P(X |          eα
                                                    µ ).
                                                      That is,
the probability distribution of ’s evaluation on α given the              can then continue to evaluate objects one by one until the
evaluation of a peer µ on α. But what does  do when there is             uncertainty of the automated assessments becomes less than
more than one peer assessing α?                                           some predefined acceptable uncertainty threshold.
   Given the set of opinions Oα describing a set of peer eval-
uations over the object α, we define the probability of ’s as-           3     Conclusions and Future Work
sessment being x as follows:
                                                                          In this paper we have presented the personalised automated
                                                                          assessments model (PAAS), a trust-based assessment service
                                Y
                                     p(X=x | eα µ)
                                                                          that helps compute group assessments from the perspective
                                µ∈Oα
         p(X=x | Oα )) = X Y                                  (12)        of a specific community member. This computation essen-
                                          p(X=x0 | eα
                                                    µ)                    tially aggregates peer assessments, giving more weight to
                             x0 ∈E µ∈Oα                                   those peers that are trusted by the specific community mem-
   In other words, the probability of ’s assessment on α being           ber whom the automated assessments are computed for. How
x given the set of opinions over α is an aggregation (a product           much this specific member trusts a peer is then based on the
in this case) of the probabilities of ’s assessment on α being           similarity or evaluation rate between his (past) assessments
x given each evaluation eα                                                and the peer’s (past) assessments over the same assignments.
                            µ ∈ Oα .
   We then define the probability distribution of ’s expected               The proposed work is an extension of the work carried out
evaluation given all opinions in Oα as P(X | Oα ), where X                in [Gutierrez et al., submitted for publication]. In fact, the
varies over the evaluation space E.                                       COMAS model is a much more simplified model of the non-
   We note that instead of the product operator
                                                     Q
                                                       other con-         ordered case. It is much more simplified as it assumes that
nectives could be used, for instance the min operator might               the probability of the similarity between two assessors is 1 for
be used. However, we note that using the minimum operator                 the aggregation of the similarities of past evaluations over the
does not take into account the number of assessments made.                same objects. PAAS’ use of probability distribution makes
That is, having assessments of 20 peers could be equivalent to            it a richer and more informative model as much more infor-
having the assessment of just one peer. In fact, the proposed             mation is preserved in the calculations. Furthermore, PAAS
aggregation of Equation 12 ensures that:                                  computes the uncertainty of the automated assessments, help-
                                                                          ing suggesting which objects should be evaluated next in or-
   • The larger the number of identical opinions, the less un-            der to decrease the overall uncertainty of PAAS’ calculations.
      certain the final probability distribution is, and                     In COMAS, experimental results were conducted on a real
   • The more trusted the opinions, the less uncertain the fi-            classroom datasets as well as simulated data that considers
      nal probability distribution is.                                    different social network topologies (where we say students


                                                                     45
assess some assignments of socially connected students). Re-            [Upton and Cook, 2008] G. Upton and I. Cook. A Dictionary
sults show that the COMAS method 1) is sound, i.e. the error              of Statistics. Oxford Paperback Reference. OUP Oxford,
of the suggested assessments decreases for increasing num-                2008.
bers of tutor assessments; and 2) scales for large numbers of           [Walsh, 2014] Toby Walsh. The peerrank method for peer as-
students.                                                                 sessment. In Torsten Schaub, Gerhard Friedrich, and Barry
   Future work on PAAS should follow a similar approach                   O’Sullivan, editors, ECAI 2014 - 21st European Confer-
for evaluation, where the same real classroom datasets can be             ence on Artificial Intelligence, 18-22 August 2014, Prague,
used as the groundtruth of marks, and we can then compare                 Czech Republic - Including Prestigious Applications of In-
PAAS’ automated assessments to that groundtruth.                          telligent Systems (PAIS 2014), volume 263 of Frontiers
   Additionally, we could also test the ranking of marks (Sec-            in Artificial Intelligence and Applications, pages 909–914.
tion 2.5) by running experiments in a real classroom where                IOS Press, 2014.
we ask the tutor to evaluate assignments once in a random or-
der and another time following the suggested ranking. This              [Wu et al., 2015] J. Wu, F. Chiclana, and E. Herrera-Viedma.
could help us check whether the error decreases faster in the             Trust based consensus model for social network in an in-
latter case. Also, we expect to find that for a given acceptable          completelinguistic information context. Applied Soft Com-
uncertainty threshold, the tutor should evaluate less assign-             puting, 2015.
ments in order to reach that threshold than evaluating ran-
domly.

Acknowledgments
This work is supported by the CollectiveMind project (funded
by the Spanish Ministry of Economy and Competitiveness,
under grant number TEC2013-49430-EXP) and the PRAISE
project (funded by the European Commission, under grant
number 388770).

References
[de Alfaro and Shavlovsky, 2013] L.        de   Alfaro    and
   M. Shavlovsky.        Crowdgrader: Crowdsourcing the
   evaluation of homework assignments. Thech. Report
   1308.5273, arXiv.org, 2013.
[Gutierrez et al., submitted for publication] Patricia Gutier-
   rez, Nardine Osman, and Carles Sierra. Trust-based com-
   munity assessment. Pattern Recognition Letters, submit-
   ted for publication.
[Li et al., 2003] Yuhua Li, Zuhair A. Bandar, and David
   McLean. An approach for measuring semantic similarity
   between words using multiple information sources. IEEE
   Trans. on Knowl. and Data Eng., 15(4):871–882, July
   2003.
[Piech et al., 2013] Chris Piech, Jonathan Huang, Zhenghao
   Chen, Chuong Do, Andrew Ng, and Daphne Koller. Tuned
   models of peer assessment in moocs. Proc. of the 6th Inter-
   national Conference on Educational Data Mining (EDM
   2013), 2013.
[Rubner et al., 1998] Yossi Rubner, Carlo Tomasi, and
   Leonidas J. Guibas. A metric for distributions with appli-
   cations to image databases. In Proceedings of the Sixth In-
   ternational Conference on Computer Vision (ICCV 1998),
   ICCV ’98, pages 59–, Washington, DC, USA, 1998. IEEE
   Computer Society.
[Sierra and Debenham, 2005] Carles Sierra and John Deben-
   ham. An information-based model for trust. In Proceed-
   ings of the Fourth International Joint Conference on Au-
   tonomous Agents and Multiagent Systems, AAMAS ’05,
   pages 497–504, New York, NY, USA, 2005. ACM.


                                                                   46