Personalised Automated Assessments Patricia Gutierrez and Nardine Osman and Carles Sierra Artificial Intelligence Research Institute (IIIA-CSIC), Barcelona, Spain {patricia, nardine, sierra}@iiia.csic.es Abstract Walsh, 2014; Wu et al., 2015] is that the computed peer-based assessment is tuned to the perspective of a specific commu- Consider an evaluator, or an assessor, who needs to nity member. PAAS aggregates peer assessments giving more assess a large amount of information. For instance, weight to those peers that are trusted by the specific commu- think of a tutor in a massive open online course with nity member whom the automated assessments are computed thousands of enrolled students, a senior program for. How much this specific member trusts a peer is then committee member in a large peer review process based on the similarity or evaluation rate between his (past) who needs to decide what are the final marks of assessments and the peer’s (past) assessments over the same reviewed papers, or a user in an e-commerce sce- assignments. To compute such a trust measure, we build a nario where the user needs to build up its opinion trust network conformed of direct and indirect trust values about products evaluated by others. When assess- among community members. Direct trust values are derived ing a large number of objects, sometimes it is sim- from common assessments while indirect trust is based in the ply unfeasible to evaluate them all and often one notion of transitivity. We clarify that our target is not consen- may need to rely on the opinions of others. In this sus building, but to accurately estimate unknown assessments paper we provide a model that uses peer assess- from a specific member’s point of view, based on the peers’ ments to generate expected assessments and tune assessments and reliability. them for a particular assessor. Furthermore, we are Finally, we are also able to provide a measure of the un- able to provide a measure of the uncertainty of our certainty of our calculated assessments and a ranking of the computed assessments and a ranking of the objects objects that should be assessed next in order to decrease the that should be assessed next in order to decrease the overall uncertainty of those calculated assessments. overall uncertainty of the calculated assessments. 1 Introduction 2 The PAAS Model Consider an assessor who needs to assess a large amount of 2.1 Notation and Problem Definition information. For instance, think of a tutor in a massive open online course with thousands of enrolled students, a senior Let  represent an assessor who needs to assess a large set of program committee member in a large peer review process objects I, and let P be a set of peers that are able to assess who needs to decide what are the final marks of reviewed objects in I. papers, or a user in an e-commerce scenario where the user We understand assessments as probability distributions needs to build up its opinion about products evaluated by oth- over an evaluation space E at a given moment in time. For ers. When assessing a large number of objects, sometimes it example, one can define a set of elements for the evalua- is simply unfeasible to evaluate them all and often one may tion space for the quality of an English classroom homework need to rely on the opinions of others. In the process of build- as E = {poor, good, excellent}. The assessment {poor 7→ ing up our opinion, some questions need to be answered, such 0, good 7→ 0, excellent 7→ 1} would represent the high- as: How much should I trust the opinion of a peer? What est assessment possible, whereas the assessment {poor 7→ should I believe given a peer’s opinion? What should I be- 0, good 7→ 1/2, excellent 7→ 1/2} would represent that the lieve when many peers give their different opinions? Which quality of the homework is most probably between good and objects should be assessed next, such that the certainty of my excellent, and so on. belief improves? We define an assessment eα i (also referred to as evaluation This paper addresses these questions through the Person- or opinion) as a probability distribution over the evaluation alised Automated ASsessment model (PAAS). PAAS uses space E, where α ∈ I is the object being evaluated and i ∈ peer assessment to calculate and predict assessments. How- { ∪ P} is the evaluator. We say eα i ={x1 7→ v1 , . . . , xn 7→ ever, what is fundamentally different from many previous vn }, where {x1 , . . . , xn } = E and vi ∈ [0, 1] represents the works [Piech et al., 2013; de Alfaro and Shavlovsky, 2013; value assigned to each element xi ∈ E, with the condition 40 X that vi=1. 2.2 Step 1. How much should I trust a peer? i∈|E|  needs to decide how much can he or she trust the assessment Finally, we define L as the history of all assessments per- of a peer µ. We define this trust measure based on the follow- formed, and Oα ⊂ L as the set of past peer assessments over ing two intuitions. Our first intuition states that if  and µ have the object α. both assessed the same object, then the similarity of their as- The ultimate goal of our work is to compute the probability sessments can give a hint of how close their judgments are. distribution of ’s evaluation over a certain object α, given the However, cases may arise where there are simply no objects evaluations of several peers over that same object α. In other evaluated by both  and µ. In such a case, one may think of words, what is the probability that ’s evaluation is x given simply neglecting µ’s assessment, as  would not know how the set of peers’ evaluations Oα ? Such expectation can be much to trust µ’s assessment. Our second intuition, however, formalized with the conditional probability as follows: proposes an alternative approach for such cases, where we ap- proximate that unknown trust between  and µ by looking into a chain of trust between  and µ through other peers. Roughly p(X=x | Oα ) speaking, we relay on the transitive notion: “if  trusts µ, and . µ trusts µ0 , then  will likely trust µ0 ”. In the following, we define these two intuitions through two different types of trust To calculate the above conditional probability, we take into relations: direct trust and indirect trust. account every particular evaluation in Oα . In other words, expectations (or probabilities) are calculated for each indi- Direct Trust vidual evaluation in Oα , before those expectations are aggre- Direct trust is the trust relation that emerges between evalua- gated into p(X=x | Oα ). The probability that ’s assessment tors that have assessed one or more objects in common. One is x given a particular evaluation eα µ ∈ Oα is formalized as possible approach is to measure such relation as aggregations follows: of their evaluations’ similarity over those objects assessed in common. For instance, let the set Ai,j ={α | eα α i , ej ∈ L} p(X=x | eα be the set of objects that have been assessed by both evalu- µ) ators i and j. Then different definitions for the direct trust . between i and j based on the similarity between two assess- The more general probability p(X=x | Oα ) is then defined ments (sim(eα α j , ej )) may be adopted, such as as: as an aggregation of the individual probabilities: • The average of the similarities for all commonly as- sessed objects: p(X=x | Oα )=p(X=x | eα µ) X sim(eα α i , ej ) where the exact definition of the aggregation is presented later α∈Ai,j on in Section 2.4. TD (i, j)= |Ai,j | We strongly base the intuition behind the computation of the individual conditional probabilities on the notion of trust • The conjunction of the similarities for all commonly as- between peers based on previous experiences, where trust is sessed objects: understood in this context as the expected similarity between ^ the assessments given by those peers. In other words, our in- TD (i, j)= sim(eα α i , ej ) α∈Ai,j tuition is that we expect  will tend to agree with µ’s assess- ments if his trust on µ is high. Otherwise, ’s evaluation will • The Pearson coefficient [Upton and Cook, 2008], or lin- probably be different. We perform then a sort of analogical ear correlation between i and j, for all commonly as- reasoning: if in the past µ gave opinions that were a certain sessed objects: degree dissimilar from ’s opinions, then this will probably X happen again now. sim(eα α i , e¯i ) · sim(ej , e¯j ) The remainder of this section is divided accordingly. We α∈Ai,j first describe in detail how the measure of trust between peers TD (i, j)= s X s X is calculated (Section 2.2). Then, we illustrate how to cal- sim(eα i , e¯i ) 2 sim(eα j , e¯j ) 2 culate ’s assessment on an object α given µ’s assessment α∈Ai,j α∈Ai,j over α and ’s trust in µ’s assessments (Section 2.3). In other words, we present an approach for calculating the individual where e¯i , e¯j are the means of the evaluations performed probability p(X=x | eα over the set Ai,j by i and j respectively. µ ). We then illustrate how to combine those probabilities to build the probability distribution of ’s However when we calculate such aggregations we loose assessments given the assessments of several peers (Section relevant information. For instance, we are not able to tell if j 2.4). In other words, we present an approach for calculating usually under rates with respect to i, if it usually over rates, the probability p(X=x | Oα ). Finally, we provide a measure or neither. We are also not able to tell if the dissimilarities of the uncertainty of the computed assessments and a ranking between i and j’s evaluations are highly variable or not. of the objects that should be assessed next by  in order to To cope with such loss of information, we define the direct decrease that uncertainty (Section 2.5). trust between two peers i and j as a probability distribution 41 TDi,j : [0, 1] → [0, 1] built from the historical data of previ- In what follows, we explain how we build direct trust dis- ous evaluations performed by i and j. This probability dis- tributions computationally, based on previous experiences. tribution describes, as we will explain shortly, the expected Initially, the direct trust distribution between any two peers similarity or the expected evaluation rate between i and j’s is the uniform distribution F={1/n, . . . , 1/n} (describing ig- assessments. The support of the distribution is [0, 1] since norance), where n is the size of the distribution’s support. both the expected similarity and the expected evaluation rate Every new assessment made would then update the trust dis- are in the range [0, 1], as we will see shortly, and the range tributions accordingly. Consider a new assessment eα i . The of the distribution is [0, 1] as this is a probability distribution distribution TDi,j ∀j s.t. Ai,j 6= ∅ is updated as follows: and the range of any probability is [0, 1]. Note that we do not 1. We find the element x in TDi,j ’s support whose probabil- consider here any summarizing measure for trust that would ity needs to be adjusted. So we calculate x=sim(eα α j , ei ) translate that distribution into a single value, although a num- in the ordered case (where the definition of sim is do- ber of measures could be used, such as the average similarity main dependent and outside the scope of this paper, (as the center of gravity of the distribution) or entropy (as a although we do note that several approaches may be measure of the uncertainty of the distribution). adopted, such as using semantic similarity measures [Li When defining TDi,j we distinguish two cases: (1) a α et al., 2003]), or x = r(ej /eαi ) in the non-ordered case first case with a non-ordered evaluation space, such as E = (Equation 1). {visionary, original, sound}; and (2) a second case with an ordered evaluation space, such as={bad, good, excellent}. In 2. We update the probability of the single expectation x in the second case, we are interested in maintaining information TDi,j accordingly: about whether a peer under rates or over rates with respect p(X=x) = p(X=x) + γ · (1 − p(X=x)) (2) to another peer, therefore we are interested in the expected evaluation rate between i and j. In the first case, this is not The update is based on increasing the latest probability an issue as assessments cannot be ordered and therefore the p(X =x) by a fraction γ ∈ [0, 1] of the total potential notion of under/over rating does not exist, therefore we are increase (1 − p(X =x)). For instance, if the probabil- rather interested in the expected similarity between i and j’s ity of x is 0.6 and γ is 0.1, then the new probability of assessments. Next we detail the trust probability distributions x becomes 0.6 + 0.1 · (1 − 0.6) = 0.64. We note that TDi,j built for both cases. the ideal value of γ should be closer to 0 than to 1 so • Non-Ordered Case. that one single experience does not result in consider- In the non-ordered case, we are interested in the similar- able changes in the distribution. In other words, a single ity between i and j’s assessments. As such, the support assessment cannot result in considerable change in the of the distribution representing i’s direct trust on j (i.e. probability distribution. Considerable changes can only the x-axis of TDi,j ) consists of the possible degrees of be the result of information learned from the accumula- similarity between i and j’s assessments. tion of many assessments. Trust distribution TDi,j (x) then describes the probability 3. We normalize TDi,j by updating several expectations that peers i and j evaluate an object with a similarity x following the entropy based approach of [Sierra and (or the probability that the similarity of their evaluations Debenham, 2005]. The entropy-based approach updates is x). TDi,j such that: (1) the value p(X=x) is maintained and (2) the resulting distribution has a minimal relative en- • Ordered Case. tropy with respect to the previous one. In other words, In the ordered case, we are interested in the evaluation we look for a distribution that contains the updated prob- rate ej/ei between evaluations made by peers i and j. ability value p(X =x) and that is at a minimal distance If ej/ei = 1, this means that i and j provide the same from the original TDi,j (as the relative entropy is a mea- evaluation. If ej/ei > 1, this meas that j over rates with sure of the difference between two probability distribu- respect to i. If ej/ei < 1, this means that j under rates tions). Following this approach, we update TDi,j (X) as with respect to i. follows: We normalize the evaluation rate to values between 0 X p(X=x0 ) and 1. To do so, we require a non decreasing function TDi,j (X) = arg min p(X=x0 ) log 0 r : R → [0, 1] such that limx→∞ r(x)=1, and for conve- P0 (X) p (X=x0 ) x0 nience we constraint r(1)=0.5. We adopt the following such that {p(X=x) = p0 (X=x)} normalized evaluation rate function that satisfies these (3) properties: where p(X =x0 ) is a probability value in TDi,j , p0 (X = r(x)=e ln 1/2/x (1) x0 ) is a probability value in P0 , and {p(X=x) = p0 (X= x)} specifies the constraint that needs to be satisfied by As such, the support of the distribution representing i’s the resulting distribution. direct trust on j (i.e. the x-axis of TDi,j ) consists of the possible normalized evaluation rates between i and j. Indirect Trust Trust distribution TDi,j (x) then describes the probability Given a direct trust relation between peers i and j and be- that i and j would assess an object with a normalized tween peers j and k, the question now is: What can we say evaluation rate x. about the indirect trust between peers i and k when i and k 42 have no objects assessed in common? In other words, given Again, this could result in more than one probability the direct trust distributions TDi,j and TDj,k , what can we say computed for the same expectation xik . As such, we about the indirect trust distribution TIi,k ? then add up all the probabilities that correspond to the As with direct trust distributions, we distinguish two cases: same expectation xik . a first case where assessments cannot be ordered and thus The calculations presented above provide an approach for trust is based on a similarity measure sim; and a second case calculating indirect trust between two peers i and k when where assessments can be ordered and thus trust 1 is based on those peers are linked through a direct trust chain passing a normalized evaluation rate function r(x)=eln /2/x . through only one intermediate peer j. For direct trust chains • Non-Ordered Case. of increasing length between i and k, the previous process In this case, we want to preserve the fundamental tri- is iterated. For instance, if there is a direct trust chain link- angular inequality property of similarity functions that ing i to j, j to m, and m to k, then we first compute the says that: T-norm(sim(a, b), sim(b, c)) ≤ sim(a, c). indirect trust distribution TIi,m from the direct trust distribu- As with TDi,k , the support (or the x-axis) of TIi,k con- tions TDi,j and TDj,m , and then we compute the indirect trust sists of the possible degrees of similarity between i and distribution TIi,k from the direct/indirect trust distributions k’s assessments. But since these degrees of similarity TIi,m and TDm,k , following the same approach as above. should satisfy the T-norm, the support is defined as the When multiple chains of direct trust connect two peers (e.g. set: say a chain linking i to j and j to k, and another chain linking supp(TIi,k )={xik=T-norm(xij , xjk ) | xij ∈ supp(TDi,j )i to m and m to k), we obtain multiple indirect trust distribu- ∧xjk ∈ supp(TDj,k )} tions (one from every chain). In those cases, we pick the re- sulting distribution which is most optimistic. In other words, where supp represents the support of a distribution. while our approach to calculate the indirect trust follows the We then compute the probabilities of the expectations of pessimistic approach (through our choice of the product oper- TIi,k as follows: ator in Equations 4 and 5), we now choose the most optimistic {p(X=xik=T-norm(xij , xjk ))=TDi,j (xij ) ∗ TDj,k (xjk ) | of the pessimistic outcomes. To do that, we choose the distri- xij ∈ supp(TDi,j ) ∧ xjk ∈ supp(TDj,k )} bution that is closest to the equivalence distribution, which is a distribution that describes that the evaluations of two peers (4) are equivalent. In the non-ordered case, the equivalence dis- This could result in more than one probability computed tribution is PE (1)=1; that is, the similarity between two peers for the same expectation xik . As such, we then add up all is maximum. In the non-ordered case, the equivalence dis- the probabilities that correspond to the same expectation tribution is PE (0.5) = 1; that is, the normalized evaluation xik . rate between two peers is 0.5, which implies that they always We note that we follow a conservative approach by provide the same evaluation. The distance between an indi- adopting the product operator (Equation 4), which is a rect trust distribution TIi,k and the equivalence distribution T-norm that gives the smallest possible values, as we PE can be calculated as: prefer not to overrate indirect trust values since they are not inferred directly from historical data. Of course, emd(TIi,k , PE ) (6) other operators could also be used, such as the min func- tion. where emd is the earth mover’s distance which calculates the distance between two probability distributions [Rubner et al., • Ordered Case. 1998].1 We note that the range of emd is [0,1], where 0 rep- In this case, we want to preserve the property: ej/ei ∗ resents the minimum distance and 1 represents the maximum ek/ej =ek/ei with respect to the evaluations performed by possible distance. i, j and k. For instance, if the evaluation rate between In the remainder of this paper, when we refer explicitly to ej and ei is 0.5 (j under rates a 50% with respect to i) a direct or indirect trust distribution between peers i and j, and the evaluation rate between ek and ej is 0.5 (k under we refer to such distribution as TDi,j or TIi,j , respectively. rates a 50 % with respect to j) then the evaluation rate Whereas when we refer generically to a trust distribution that between ek and ei should be 0.25 (then k under rates a could either be the direct or indirect trust distribution, we re- 75 % with respect to i). fer to such a distribution as Ti,j . As above, the support (or the x-axis) of TIi,k consists of the possible degrees of similarity between i and k’s Trust Graph assessments. The support us then defined as the set: Direct and indirect trust relations in a community can be rep- resented by a weighted directed graph. We define a commu- supp(TIi,k ) = {xik=xij ∗ xjk | xij ∈ supp(TDi,j ) nity’s trust graph as: ∧xjk ∈ supp(TDj,k )} G=hN, E, wi We then compute the probabilities of the expectations of 1 TIi,k as follows: If probability distributions are viewed as piles of dirt, then the earth mover’s distance measures the minimum cost for transforming {p(X=xik=xij ∗ xjk ) = TDi,j (xij ) ∗ TDj,k (xjk ) | one pile into the other. This cost is equivalent to the ‘amount of dirt’ xij ∈ supp(TDi,j ) ∧ xjk ∈ supp(TDj,k )} times the distance by which it is moved, or the distance between (5) elements of the probability distribution’s support. 43 where the set of nodes N is the set of evaluators in { ∪ P}, where Λ is the decay function satisfying the property: 0 E ⊆ N × N are edges between evaluators with direct or lim Tti,j t = D. One possible definition for Λ could be: indirect trust relations, and w : E 7→ [0, 1]n is the weight of 0 t →∞ an edge, described as a trust probability distribution. 0 Tti,j t = ν ∆t,t0 · Tti,j + (1 − ν ∆t,t0 )D (8) D ⊂ E is the set of edges that link evaluators with direct trust relations: D = {(i, j) ∈ E | TDi,j 6= ⊥}. Similarly, where ν is the decay rate, and: I ⊂ E is the set of edges that connect evaluators with indirect  trust relations: I = {(i, j) ∈ E | TIi,j 6= ⊥} \ D. We note 0 , if t0 − t < ω that the set of edges E is then composed of the union of the ∆t,t0 = t0 − t 1 + , otherwise set of direct and indirect edges: E = D ∪ I. Weights in w tmax describe direct and indirect trust probability distributions and The definition of ∆t,t0 above serves the purpose of estab- are defined as follows: lishing a minimum grace period, determined by the parameter  TDi,j , if (i, j) ∈ D ω, during which the information does not decay, and that once w(i, j) = reached the information starts decaying. The parameter tmax , TIi,j , if (i, j) ∈ I which may be defined in terms of multiples of ω, controls the Our goal is to determine how much a particular evaluator pace of decay. The main idea behind this is that after the  can trust a peer µ. So the trust graph is constructed with grace period, the decay happens very slowly; in other words, respect to ’s point of view only. Therefore, we maintain a ∆t,t0 decreases very slowly. trust graph of the whole community containing all the direct edges between peers (as they are needed to calculate indirect 2.3 Step 2: What to belief when a peer gives an trust relations), but we only maintain the indirect edges that opinion? connect  with the rest of the peers. Given a peer assessment eα µ , the question now is how to com- Information Decay pute the probability distribution of ’s evaluation. In other An important notion in our proposal is the notion of the decay words, what is the probability that ’s evaluation of α is x of information. We say the integrity of information decreases given that µ evaluated α with eα µ . As illustrated earlier, this is with time. In other words, the information provided by a trust expressed as the conditional probability: probability distribution should lose its value over time and P(X=x | eα µ) decay towards a default value. We refer to this default value as the decay limit distribution D. For instance, D may be the To calculate this conditional probability, the intuition is uniform distribution, which describes that trust information that  would tend to agree with µ’s evaluation if his trust on learned from past experiences tends to ignorance over time. µ (that is, the expected similarity between their assessments To implement such a decay mechanism, we need to: or the expected evaluation rate between their assessments) is 1. Record all evaluations eα high. Otherwise, ’s evaluation would probably be different. i ∈ L made at time t with a αt We perform then a sort of analogical reasoning: if in the past timestamp t, noted ei . µ gave assessments that were a certain degree dissimilar from 2. Record all direct trust distributions TDi,j with a times- ’s opinions, or with a certain evaluation rate with respect to tamp t, noted TD ti,j , where t is the timestamp of the last , then this will probably happen again now. evaluation that modified the trust distribution. The first We then calculate the above conditional probability based time TDi,j is calculated, t is the timestamp of the latest on the following desired properties: evaluation amongst the two evaluations leading to this • If T,µ is a flat distribution (i.e. a distribution represent- calculation. (Recall that it is the similarity between two ing ignorance), then P(X | eα µ ) should also be a flat evaluations or the evaluation rate that updates the prob- distribution. That is, the closer ’s trust on µ is to igno- ability distribution.) Then, every time a new evaluation rance, the less information µ is giving to  with his/her with timestamp t0 > t is considered to update TD ti,j , assessment. TD ti,j is first decayed from t to t0 before the distribution • The degree of belief eα  = x should increase for those is updated. points x whose similarity (or evaluation rate, in the case 3. Record all indirect trust distributions TIi,j with a times- of the ordered case) to eα µ is high (i.e. for higher values tamp t, noted TI ti,j , where t is the time the distribution is of T,µ ). calculated. Every time TIi,j is calculated, all probability • The degree of belief eα  = x should decrease for those distributions involved in this calculation will first need points x whose similarity (or evaluation rate, in the case to be decayed to the time of calculation t. The time of of the ordered case) to eα µ is low trust (i.e. for lower calculation is usually the latest timestamp amongst the values of T,µ ). timestamps of the distributions involved in this calcula- tion. Formally, these properties are achieved by defining the probabilities accordingly (where the denominator of the fol- Information in a trust probability distribution Ti,j decays lowing two equations, Equations 9 and 10, is used for normal- from t to t0 (where t0 > t) as follows: isation to ensure that the resulting distribution is a probability 0 Tti,j t = Λ(D, Tti,j ) (7) distribution): 44 • Non-Ordered Case. Finally, to translate the final assessment from a probability distribution P(X | Oα ) into a single value, we calculate the α mean (average) of the distribution and select the closest mark eT,µ (sim(eµ ,x))·I(T,µ ) to that mean. p(X=x | eα µ) = X α 0 (9) eT,µ (sim(eµ ,x ))·I(T,µ ) x0 ∈E 2.5 Step 4: What should be evaluated next? The previous three steps have provided a model to calcu- • Ordered Case. late automated assessments of objects that have not been as- sessed by , based on peers opinions. The level of uncer- α tainty of the automated assessments generated by our model eT,µ (r(eµ /x))·I(T,µ ) can be calculated as the uncertainty of the probability distri- p(X=x | eα µ) = X α 0 (10) eT,µ (r(eµ /x ))·I(T,µ ) bution of ’s expected evaluation based on those peers opin- x0 ∈E ions P(X | Oα ). This level of uncertainty is measured by the distribution’s entropy: where I(T,µ ) is a measure of how informative the probability distribution T,µ is. We calculate I(T,µ ) as: H(P(X | Oα )) The question that naturally arises then is what objects can I(T,µ ) = 1 − H(T,µ ) (11) be assessed next by  to decrease such uncertainties? For ex- where H describes the entropy of a probability distribution. ample, how many more assignments should a tutor evaluate In other words, the lower the entropy of the distributions then so that the uncertainty of the calculated assessments becomes the more informative it is, and vice versa. acceptable. We suggest  to evaluate objects with maximum We finally define the probability distribution of ’s ex- uncertainty, or maximum entropic value. The ranking of ob- pected evaluation given µ’s opinion accordingly: P(X | eα µ ), jects with respect to their entropic value is then defined as where X varies over the evaluation space E. follows: 2.4 Step 3: What to belief when many give Rank(α) = 1 − H(P(X X | Oα )) opinions? =1+ p(X=x | Oα ) ln p(x | Oα ) (13) x∈X In the previous section we computed P(X | eα µ ). That is, the probability distribution of ’s evaluation on α given the  can then continue to evaluate objects one by one until the evaluation of a peer µ on α. But what does  do when there is uncertainty of the automated assessments becomes less than more than one peer assessing α? some predefined acceptable uncertainty threshold. Given the set of opinions Oα describing a set of peer eval- uations over the object α, we define the probability of ’s as- 3 Conclusions and Future Work sessment being x as follows: In this paper we have presented the personalised automated assessments model (PAAS), a trust-based assessment service Y p(X=x | eα µ) that helps compute group assessments from the perspective µ∈Oα p(X=x | Oα )) = X Y (12) of a specific community member. This computation essen- p(X=x0 | eα µ) tially aggregates peer assessments, giving more weight to x0 ∈E µ∈Oα those peers that are trusted by the specific community mem- In other words, the probability of ’s assessment on α being ber whom the automated assessments are computed for. How x given the set of opinions over α is an aggregation (a product much this specific member trusts a peer is then based on the in this case) of the probabilities of ’s assessment on α being similarity or evaluation rate between his (past) assessments x given each evaluation eα and the peer’s (past) assessments over the same assignments. µ ∈ Oα . We then define the probability distribution of ’s expected The proposed work is an extension of the work carried out evaluation given all opinions in Oα as P(X | Oα ), where X in [Gutierrez et al., submitted for publication]. In fact, the varies over the evaluation space E. COMAS model is a much more simplified model of the non- We note that instead of the product operator Q other con- ordered case. It is much more simplified as it assumes that nectives could be used, for instance the min operator might the probability of the similarity between two assessors is 1 for be used. However, we note that using the minimum operator the aggregation of the similarities of past evaluations over the does not take into account the number of assessments made. same objects. PAAS’ use of probability distribution makes That is, having assessments of 20 peers could be equivalent to it a richer and more informative model as much more infor- having the assessment of just one peer. In fact, the proposed mation is preserved in the calculations. Furthermore, PAAS aggregation of Equation 12 ensures that: computes the uncertainty of the automated assessments, help- ing suggesting which objects should be evaluated next in or- • The larger the number of identical opinions, the less un- der to decrease the overall uncertainty of PAAS’ calculations. certain the final probability distribution is, and In COMAS, experimental results were conducted on a real • The more trusted the opinions, the less uncertain the fi- classroom datasets as well as simulated data that considers nal probability distribution is. different social network topologies (where we say students 45 assess some assignments of socially connected students). Re- [Upton and Cook, 2008] G. Upton and I. Cook. A Dictionary sults show that the COMAS method 1) is sound, i.e. the error of Statistics. Oxford Paperback Reference. OUP Oxford, of the suggested assessments decreases for increasing num- 2008. bers of tutor assessments; and 2) scales for large numbers of [Walsh, 2014] Toby Walsh. The peerrank method for peer as- students. sessment. In Torsten Schaub, Gerhard Friedrich, and Barry Future work on PAAS should follow a similar approach O’Sullivan, editors, ECAI 2014 - 21st European Confer- for evaluation, where the same real classroom datasets can be ence on Artificial Intelligence, 18-22 August 2014, Prague, used as the groundtruth of marks, and we can then compare Czech Republic - Including Prestigious Applications of In- PAAS’ automated assessments to that groundtruth. telligent Systems (PAIS 2014), volume 263 of Frontiers Additionally, we could also test the ranking of marks (Sec- in Artificial Intelligence and Applications, pages 909–914. tion 2.5) by running experiments in a real classroom where IOS Press, 2014. we ask the tutor to evaluate assignments once in a random or- der and another time following the suggested ranking. This [Wu et al., 2015] J. Wu, F. Chiclana, and E. Herrera-Viedma. could help us check whether the error decreases faster in the Trust based consensus model for social network in an in- latter case. Also, we expect to find that for a given acceptable completelinguistic information context. Applied Soft Com- uncertainty threshold, the tutor should evaluate less assign- puting, 2015. ments in order to reach that threshold than evaluating ran- domly. Acknowledgments This work is supported by the CollectiveMind project (funded by the Spanish Ministry of Economy and Competitiveness, under grant number TEC2013-49430-EXP) and the PRAISE project (funded by the European Commission, under grant number 388770). References [de Alfaro and Shavlovsky, 2013] L. de Alfaro and M. Shavlovsky. Crowdgrader: Crowdsourcing the evaluation of homework assignments. Thech. Report 1308.5273, arXiv.org, 2013. [Gutierrez et al., submitted for publication] Patricia Gutier- rez, Nardine Osman, and Carles Sierra. Trust-based com- munity assessment. Pattern Recognition Letters, submit- ted for publication. [Li et al., 2003] Yuhua Li, Zuhair A. Bandar, and David McLean. An approach for measuring semantic similarity between words using multiple information sources. IEEE Trans. on Knowl. and Data Eng., 15(4):871–882, July 2003. [Piech et al., 2013] Chris Piech, Jonathan Huang, Zhenghao Chen, Chuong Do, Andrew Ng, and Daphne Koller. Tuned models of peer assessment in moocs. Proc. of the 6th Inter- national Conference on Educational Data Mining (EDM 2013), 2013. [Rubner et al., 1998] Yossi Rubner, Carlo Tomasi, and Leonidas J. Guibas. A metric for distributions with appli- cations to image databases. In Proceedings of the Sixth In- ternational Conference on Computer Vision (ICCV 1998), ICCV ’98, pages 59–, Washington, DC, USA, 1998. IEEE Computer Society. [Sierra and Debenham, 2005] Carles Sierra and John Deben- ham. An information-based model for trust. In Proceed- ings of the Fourth International Joint Conference on Au- tonomous Agents and Multiagent Systems, AAMAS ’05, pages 497–504, New York, NY, USA, 2005. ACM. 46