=Paper=
{{Paper
|id=Vol-1633/ws1-paper1
|storemode=property
|title=Probing the Landscape: Toward a Systematic Taxonomy of Online Peer Assessment Systems in Education
|pdfUrl=https://ceur-ws.org/Vol-1633/ws1-paper1.pdf
|volume=Vol-1633
|authors=Dmytro Babik,Edward Gehringer,Jennifer Kidd,Ferry Pramudianto,David Tinapple
|dblpUrl=https://dblp.org/rec/conf/edm/BabikGKPT16
}}
==Probing the Landscape: Toward a Systematic Taxonomy of Online Peer Assessment Systems in Education==
Probing the Landscape: Toward a Systematic Taxonomy of Online Peer Assessment Systems in Education Dmytro Babik Edward F. Gehringer Jennifer Kidd James Madison University North Carolina State University Old Dominion University 421 Bluestone Dr. Department of Computer Science 166-7 Education Building Harrisonburg, VA 22807 Raleigh, NC 27695 Norfolk, VA 23529 +1 (540) 568-3064 +1 (919) 515-2066 +1 (757) 683-3248 babikdx@jmu.edu efg@ncsu.edu jkidd@odu.edu Ferry Pramudianto David Tinapple North Carolina State University Arizona State University Department of Computer Science Dixie Gammage Hall Raleigh, NC 27695 Tempe, AZ 85287 +1 (919) 513-0816 +1 (480) 965-3122 fferry@ncsu.edu david.tinapple@asu.edu ABSTRACT assessment systems, and (2) to develop an arsenal of web We present the research framework for a taxonomy of online services for a wide range of applications in such systems. We educational peer-assessment systems. This framework enables have examined a number of these systems, including such better researchers in technology-supported peer assessment to under- known ones as Calibrated Peer Review [20], CritViz [26], stand the current landscape of technologies supporting student CrowdGrader [6], Expertiza [9], Mobius SLIP [2], Peerceptiv [5] peer review and assessment, specifically, its affordances and and peerScholar [14]. We adopt the term “online peer assessment constraints. The framework helps identify the major themes in system” to describe the broad range of computer applications existing and potential research and formulate an agenda for purposefully designed and developed to support student peer future studies. It also informs educators and system design review and assessment. Specifically, we define an online peer- practitioners about use cases and design options. assessment system as a web-based application that facilitates peer assessment process workflow, such as collecting submission Keywords artifacts, allocating reviewers to critique and/or evaluate designated artifacts submitted by peers, setting deadlines, and Peer assessment, peer review, system design, rubric, scale guiding reviewers on the format of the qualitative and quantitative feedback. This term covers a class of systems 1. INTRODUCTION described in the literature as “computer (technology, IT, CIT, In the twenty years that the web has been widely used in ICT, network, internet, web, cloud)-aided (assisted, based, education, dozens, if not hundreds, of online peer assessment enabled, mediated, supported)” peer assessment (review, systems have appeared. They have been conceived by educators evaluation) systems (in any combination). Online peer- in many disciplines, such as English, Computer Science, and assessment systems are a subset of a general class of social Design, to name a few. Topping [29] highlighted computer-aided computing systems that involve peer review (including social peer assessment as an important pedagogical approach to networking and social-media applications, such as wikis, blogs, developing higher level competencies. Surprisingly, most of these and discussion forums), but are distinguished by having specific systems have been designed “from the ground up” — until now, workflow constraints and being directed at specific educational there is little evidence that designers and developers of one goals. system have consulted other systems to see what existing The purpose of this paper is to set up a framework for the techniques are appropriate to their experience, and what can be systematic review and analysis of the current state of online peer done better. Several authors have conducted reviews of existing assessment systems. We contrast our study with the earlier peer assessment approaches [4, 5, 8, 11, 19, 25, 28]. To the best surveys by Luxton-Reilly [19] and Søndergaard and Mulder [25], of our knowledge, however, no one has proposed a systematic which considered the facilities of individual systems one by one research framework for exploring and generalizing affordances and then contrasted them. Our approach is to discuss functional- and constraints of educational technology-enabled peer ities of systems, and then describe how individual systems realize assessment systems. those funtionalities. Thus, in a sense, it is a dual of the earlier Our Peerlogic project1 is pursuing two primary goals: (1) to papers. Alternatively, one might say it applies the jigsaw systematically explore the domain of technology-enabled peer technique [34] to them. Because of space limitations, this paper only begins to apply the taxonomy, which we will elaborate and 1 extend in a future paper. The Peerlogic project is funded by the National Science Foundation under grants 1432347, 1431856, 1432580, We use our framework to examine affordances and limitations of 1432690, and 1431975. the systems that have been developed since 2005 and how they address pedagogical, philosophical, and technological decisions. These objectives and use cases are system-independent because We also exploit the framework to develop a research agenda to they are not determined by the system in which they are realized guide future studies. In this paper, we will begin to address these but rather by the user needs independent of any system. In this important research questions: What is the current state of the paper, for illustration purposes, we focus only on objective I online peer assessment in education? How is technology (Table 2). transforming and advancing student peer review? Next, we examined a sample set of online peer assessment We address this study to several audiences such as peer systems to identify how these use cases are implemented as assessment researchers, practitioners, system designers and functionality (features). In this study, we focus on functionality educational technologists. Researchers in learning analytics can relevant specifically to the student peer-to-peer interactions in the learn what peer-assessment data can be extracted and mined. review and assessment process and ignore complementary Software designers can learn what has been designed and functionality that is germane to any learning, knowledge implemented in the past. Instructors applying peer review management or communication systems (such as learning-object pedagogy in their classes can find what systems and functionality content management). A given use case may be implemented in would best meet their needs. Instructors may turn to ed-tech various systems as different ensembles of features, with varying specialists and instructional designers to answer these questions; design options. Therefore, functionality and design options are thus, the latter also constitute an audience for this work. system-dependent. For each functionality, specific design options Conversely, marketers of these systems may identify the unique were identified and categorized. features of their systems so they can inform their constituencies. Visually, our framework can be represented as hierarchically 2. FRAMEWORK AND METHODOLOGY organized layers, where the top layer comprises objectives, which determine use cases, supported by functionality, implemented as 2.1 Framework specific design options (Figure 1). We applied a grounded theory approach to construct our framework. First, we identified all possible use cases, occurring in the online peer assessment. For this, we used an informal focus group, where faculty using peer assessment in their pedagogy described various situations and scenarios. In addition, academic papers on peer assessment were reviewed and relevant practices were brought to the discussion. Through this discussion of practices, the peer assessment process use cases were identified and categorized. Next, these use case categories were formalized Figure 1. Research framework for a taxonomy of online peer as objectives of the peer assessment process. Thus, we obtained a assessment systems. classification of system-independent peer assessment objectives and respective use cases that support these objectives (Table 1). 2.2 Data Collection and Analysis Table 1. Primary objectives for online peer assessment systems Data collection was conducted through iterative paper presenta- tion, system demonstrations, and discussions, documented as Objective Descriptive Questions written notes and video recordings (including screencasts) shared I. Eliciting How do student reviewers input evaluation online. Over three years, the authors have reviewed and experi- evaluation data (quantitative and qualitative, mented with multiple available systems, designed and implement- structured and semi-structured)? What ed their own systems, systematically reviewed literature, and input controls are used to elicit responses? collaborated with other creators and users of systems in research and practice. II. Assessing How are peer assessment results computed achievement and presented to instructors and to Identified, categorized and formalized themes, patterns, use and generating students? What assessment metrics can be cases, and design choices led to the construction of the learning used? framework. Then we used this framework to design analytics questionnaires for surveys and structured interviews to collect III. Structuring What is the process of online peer review? additional data on each identified system. Collected data was automated peer What variations of this process exist? synthesized in a spreadsheet, with formally defined “cases” and assessment “variables”. Our current sample includes 40 systems described in workflow the literature and found on the web. For the purpose of this paper, we illustrate our analysis with a subsample of selected IV. Reducing or How assessment subjectivity can be systems (Figure 2). Finally, the multi-case method will be used to controlling for reduced or controlled for? What metrics of complete our taxonomy and to answer our research questions in evaluation assessment inaccuracy can be used? the full study. biases V. Changing How online peer assessment can be 3. SAMPLE ANALYSIS social conducted to achieve higher-level learning To demonstrate the application of our research framework for the atmosphere of and other benefits? analysis of the online peer assessment systems in education, in the learning this paper, we focus on Objective I, “Eliciting evaluation”. We community analyze the input mechanisms and controls that students use to conduct peer assessment. In general, the review process involves two tasks: (a) providing quantitative evaluations based on some judged. Quality-level definitions specify achievement levels (e.g., criterion or criteria and using some scale, and (b) providing “meets standards”, “needs improvement”) and help assessors qualitative critiques or comments to peers’ artifacts. Therefore, understand what evidences those levels. The scoring strategy this objective is manifested in two distinct use cases: (I) translates reviewer judgments into usable, often numeric, “Eliciting quantitative peer evaluation” and (II) “Eliciting representations. qualitative peer evaluation, critiquing and commenting”. Use case Rubrics can be categorized as holistic or specific/analytical [13, I is supported by two functionalities: rubrics and scales used for 15]. In a holistic rubric, a submission is judged as a whole, with a quantitative assessment; use case II is also supported by two single value or category representing its overall quality. In functionalities: critique artifact media types and contrast, a specific/analytic rubric requires evaluations on several contextualization of critiques (Table 2). We present below the distinct criteria. taxonomy of specific design choices available for these functionalities and illustrate them with examples of specific In the context of peer review, we found that the term “rubric” has systems. been used more loosely to describe a multitude of evaluative processes and structures. Some systems offer wide flexibility in Table 2. The application of the research framework for the design of rubrics that may or may not contain all three elements, analysis of objective I while other systems are more restrictive. This leaves to the instructor assessment decisions, such as the type of rubric, the System-independent System-dependent number of criteria, the number of achievement levels, the point value for each level, whether to use definitions, numeric scales, Objective Use case Functionality Design or both to delineate achievement levels. For example, in Canvas (features) options and Expertiza, a rubric can vary from a series of open-ended questions with no established quality levels or quantitative scores I. Evaluation Use case I: Rubrics Holistic to an elaborate rubric with multiple criteria, detailed definitions, elicitation Eliciting and a complex scoring strategy. In CritViz, a rubric is a set of quantitative Specific / questions that reviewers have to consider when evaluating peers’ peer analytic submissions. Mobius SLIP supports creation of a qualitative evaluation rubric complete with the essential components but elicits holistic Scales Rating quantitative evaluation (Figure 2). Typically, online peer review systems, e.g., Expertiza, Calibrated Ranking Peer Review, Peerceptiv, and Canvas, support specific/analytical rubrics because they generate more detailed feedback that helps Use case II: Critique Plain text students understand their performance on each of these criteria. Eliciting artifact media Specific rubrics provide a more granular picture of artifacts’ qualitative types Rich text / strengths and weaknesses and more guidance to students as they peer hypertext / complete subsequent revisions or assignments. Some systems, evaluation, URL such as Mobius SLIP and CritViz favor holistic evaluations (even critiques, if some specific rubrics are provided); noticeably, these systems comments Inline file also rely in ranking (rather than rating) evaluations. Holistic annotation rubrics make more sense for overall ranking, as it may be tedious for evaluators to rank multiple products on each of several Multimedia criteria. attachments Limited choices in rubric design reduce the instructor’s control over pedagogical implications of using different rubric types, but Contextuali- Non- free them to focus on other aspects of instruction. Instructors zation of contextu- new to assessment may appreciate not having to make too many critiques alized of these decisions. Some systems fall in the middle, dictating some parameters, but allowing flexibility with others. For Contextu- example, Peerceptiv allows instructors to determine the number alized of criteria, but requires each criterion to have a 7-point scale, unaccompanied by elaborated definitions. If rubric design is a critical factor in the institution’s use of peer review process, 3.1 Eliciting Quantitative Peer Evaluation instructors must carefully vet and select the system which best 3.1.1 Rubrics fits their assignment and assessment requirements. Rubrics are used at all levels of education to evaluate a wide In the context of peer review, rubrics are also associated with variety of products. A rubric is an assessment tool that higher student achievement [18] and higher reliability of peer communicates expectations for an assignment submission. A evaluations [12, 30]. Several studies suggested students need to well-designed rubric must consist of three essential components: engage with the rubrics in order for them to be effective [20]. evaluation criteria, quality level definitions, and a scoring Providing rubrics when an assignment is first given and asking strategy [20]. Evaluation criteria are the factors deemed to be students to complete self- and peer reviews were shown to be important on which the goodness of the submission will be effective ways to facilitate this engagement. (a) Canvas (b) Expertiza (c) CritViz (d) Mobius SLIP Figure 2. Screenshot of selected online peer assessment systems While rubrics are typically viewed as an assessment tool, many similar quality as much as between the artifacts whose qualities researchers suggested that they have a second, often overlooked, may be far apart. Some ordinal scales may implicitly emphasize instructional purpose. When used formatively, rubrics can illum- items earlier in the list and lead to their higher ranking. inate strengths and weaknesses and suggest a direction for future Evaluating on ordinal scales places higher cognitive load on the improvements. Rubrics help students understand what to change evaluators because it requires them to compare multiple items in their work and help educators see where future instruction against each other. Thus, rubrics that use ordinal scales tend to should be directed. Interestingly, studies of student perceptions contain fewer criteria, and consequently, they may not draw of rubrics suggested that students value these formative purposes. evaluators’ attention to as many salient features of the artifact Students observed that rubrics clarify the objectives for their under review. Scores from rating-based systems are usually work, help them plan their approach, check their work, and determined by calculating a weighted average of scores given to reflect on feedback from others. They also report producing various criteria, which means they depend on multiple, better submissions, earning higher grades, and feeling less independent decisions by each evaluator, rather than a single anxious about assignments when they are provided with a rubric decision of how to rank this submission relative to others. [20]. Most online peer assessment systems are rating-based, e.g., Empirical studies support students’ impressions, providing Calibrated Peer Review, Peerceptiv, Expertiza. Typically, a rating evidence that rubrics support teaching and learning and scale is presented as a drop-down menu or validated text box. contribute to higher achievement [13, 20]. Ranking-based systems have been also gaining prominence thanks to the strengths of the ordinal evaluation approach. In Online peer review systems offer a variety of means for CritViz, for example, students have to “drag and drop” supporting the formative use of rubrics. Some allow different submission artifacts to position them in the ranking order rubrics to be used for different rounds of peer review within a according to the reviewers perception of their quality. Yet other single assignment; others offer calibration to show students how systems attempt to take advantage of combining both evaluation peer evaluations compare to the instructor assessment on a scales in a single control. For example, in Mobius SLIP, the SLIP selected sample assignment. Many systems allow student Slider control (figure 2) allows recording ratings on the 0-100 achievement scores to be calculated in different ways depending scale as well as ranking, which then can be used separately to on whether peer review is used for formative or summative generate analytics and grading data. Naturally, for such controls purposes. These features, while important to this discussion, are to function, they should exclude the possibility of assigning the beyond the purview of this paper, and will be discussed in a same rating to any two artifacts, but they allow placing two future publication. artifacts close to each other to indicate approximately the same level of quality. Another example of a system that supports both 3.1.2 Scales ranking and rating scales is peerScholar [14], where the instructor In general, quantitative evaluations may be conducted using can configure an assignment to have either a rating scale or a either ranking or rating [9]. Rating refers to the comparison of ranking scale. Inasmuch as long rubrics also seem to elicit more different items using a common absolute, or cardinal, scale textual feedback, systems that use ranking may provide the (either numeric or categorical). Ranking, sometimes also called author with less feedback on the quality of the submission and forced-distribution rating, means comparing different items guidance how to improve it [35]. directly one to another on a relative, or ordinal, scale [22]. Both ranking and rating have their strengths and weaknesses, and there is still little consensus as to which has a greater predictive 3.2 Eliciting Qualitative Peer Evaluation validity [1, 16, 17]. 3.2.1 Critique Artifact Media Types Critiques, as the verbal component of reviews, can be provided in Generally, ranking and rating are expected to correlate, but some different formats. The most obvious and typical design choice is studies have demonstrated that ordinal (ranking-based) to prompt the reviewer to post a plain-text comment in a text evaluations contain significantly less noise than cardinal (rating- box. Most systems provide a web form combining rubric based) evaluations [23, 32]. A cardinal scale in the context of questions and text boxes to fill out. Plain-text feedback is the peer evaluations is also susceptible to score inflation, whereas an most basic and arguably the fastest way to provide feedback. ordinal scale is immune to this problem [9]. When a cardinal Textual critiques can be enhanced by allowing rich-text format scale is used, an evaluator may “smokescreen” his preferences by (varying font faces and sizes, bullet points, alignment, hyperlinks, giving all evaluated artifacts the same rating, and may severely etc.) using the WYSIWIG editors. Including a hyperlink in the inflate scores by giving all artifacts the same high ratings text feedback further enhances the options by referencing an (similarly, he can severely degrade scores by giving all artifacts externally hosted copy of the submission artifact (which can be the same low ratings). Thus, a cardinal scale is very vulnerable to edited and/or annotated) or by referencing externally hosted social or personal biases (e.g., “never give the highest rating”) multimedia critique artifacts, such as voice and video recordings, and idiosyncratic shocks (e.g, mood or inconsistency in screencasts and HTML documents. Only a few systems (e.g, evaluation style). When an ordinal scale is used, an evaluator Canvas) allow internal hosting of multimedia critique artifacts, must construct an explicit total ordering of artifacts (based on but arguments have been made that this type of critiques their perceived quality) over others [24]. This makes the substantially improves the provider’s efficiency and the evaluation more robust. Psychological evidence suggests that recipient’s experience. evaluators are better at making comparative judgments than absolute ones [26, 31]. The next step up in providing rich critiques is inline file annotation. Several systems take advantage of the third party The ordinal scale also has its drawbacks. It forces evaluators to APIs allowing inline file annotations of submission artifacts discriminate between artifacts that may be perceived to have very uploaded as files. For instance, Mobius SLIP and Canvas utilize a entry space (textbox) associated with a specific criterion/question document viewer called Crocodoc, which renders various file in the rubric (e.g, Expertiza, CritViz) or as inline file annotation formats as an HTML document and allows reviewers to select with Crocodoc (e.g., Mobius SLIP, Canvas). Further exploration portions of the document and annotate them in place. Annotation of this functionality and design options for its implementation includes highlighting, commenting, adding text and primitive will be provided in the full study. graphic elements. This feature is similar to adding comments in a Microsoft Word file or a Google doc. Crocodoc supports both 4. CONCLUSION non-anonymous and anonymous file annotation. While the We have presented our initial attempt at formulating the research Crocodoc API is used by a number of systems, after its framework for a taxonomy of educational online peer assessment acquisition by Box in 2013, it is expected to be replaced by a new systems. This framework enables researchers of technology- API with similar, and possibly, more advanced inline file supported peer assessment to understand the current landscape of annotation functionality. Web annotation is another possible technologies supporting student peer review and assessment, implementation of inline annotation in the web-based online peer specifically, its affordances and constraints. Importantly, this assessment systems [33] but no systems in our illustrative sample framework helps identify the major research questions in existing rely on it; therefore, this option needs to be explored further. To and potential research and formulate agenda for the future the best of our knowledge, no existing online peer review systems studies. It also informs educators and system design practitioners offer its “native,” custom-built inline file annotation about use cases and design options in this particular branch of functionality. educational technology. Since text critiques may not offer the desired expressiveness and Using a grounded theory approach, we identified several primary clarity that other media may provide, users have been requesting objectives for online peer assessment systems and combined to allow reviewers to attach multimedia files containing critique them in the research framework. To illustrate the application of artifacts (e.g., images, voice or video recordings) as an alternative this framework in this research-in-progress paper, we presented a to inserting URLs to such externally hosted files in the plain- or sample analysis of how use cases supporting the objective of rich-text comments. Such an option, for example, would allow eliciting quantitative and qualitative peer evaluations are reviewers, who are more comfortable using traditional media implemented in several different systems. In the future, full (e.g., pen and paper), to write their critiques offline, then scan study, we intend to apply the multi-case method to conduct a them into pdf or image files, and then attach them to the original complete analysis of the objectives based on a large sample of submission artifacts. For another example, some reviewers may online peer assessment systems. also be more productive when providing their critiques as voice or screencast recordings made directly in the system. In our sample only Canvas offers such options, but since they are 5. REFERENCES [1] Alwin, D. F., & Krosnick, J. A. 1985. The Measurement of available in other social learning applications, such as Values in Surveys: A Comparison of Ratings and Rankings. VoiceThread (voicethread.com), it is reasonable to expect Public Opinion Quarterly, 49(4), 535–552. increasing availability of such functionality in online peer http://doi.org/10.1086/268949 assessment in the near future. [2] Babik, D., Iyer, L., & Ford, E. 2012. Towards a 3.2.2 Contextualization of Critiques Comprehensive Online Peer Assessment System: Design A number of factors influence how well the author of the Outline. Lecture Notes in Computer Science, 7286 LNCS, submission artifact is able to understand and relate to a reviewer's 1–8. feedback: spatial relationship of the critique artifacts with the [3] Babik, D., Singh, R., Zhao, X., & Ford, E. 2015. What You submission artifact, placing critiques in the specific context of Think and What I Think: Studying Intersubjectivity in the submission artifact, and the granularity of comments. For Knowledge Artifacts Evaluation. Information Systems example, directly annotating an issue in a fragment of the Frontiers. http://doi.org/10.1007/s10796-015-9586-x submission artifact, rather than trying to explain in the overall, “detached”, critique where the issue is located and how to fix it, [4] Bouzidi, L., & Jaillet, A. 2009. Can Online Peer Assessment simplifies communication between the reviewer and the author. Be Trusted? Educational Technology & Society, 12(4), 257– Moreover, annotation is more suitable for providing specific fine- 268. grained comments, while filling out a text box is more [5] Cho, K., & Schunn, C. D. 2007. Scaffolded Writing and appropriate for more global comments. Rewriting in the Discipline: A Web-based Reciprocal Peer Review System. Computers & Education, 48(3), 409–426. We define this aspect of eliciting qualitative evaluation as the http://doi.org/10.1016/j.compedu.2005.02.004 contextualization of critiques. Naturally, the system interface design determines how much critiques can be contextualized in [6] Davies, P. 2000. Computerized Peer Assessment. relation to submission artifacts. Moreover, the interface Innovations in Education and Teaching International, implementation of other functionalities, such as rubrics, scales 37(4), 346–355. and critique artifact media types closely interplays with the [7] De Alfaro, L., & Shavlovsky, M. 2014. CrowdGrader: A implementation of critique contextualization. Contextualization Tool for Crowdsourcing the Evaluation of Homework of critiques, thus, has two options: (a) “detached”, non- Assignments. In Proceedings of the 45th ACM Technical contextualized (“single comment per submission”); (b) Symposium on Computer Science Education (pp. 415–420). contextualized (“multiple comments in various fragments of the New York, NY, USA: ACM. submission”). While the former is typically available in all http://doi.org/10.1145/2538862.2538900 systems in our sample, the latter is implemented as either an [8] Doiron, G. 2003. The Value of Online Student Peer Review, Management, 35(4), 899–927. Evaluation and Feedback in Higher Education. CDTL Brief, http://doi.org/10.1177/0149206307312514 6(9), 1–2. [23] Shah, N. B., Bradley, J. K., Parekh, A., Wainwright, M., & [9] Douceur, J. R. 2009. Paper Rating vs. Paper Ranking. ACM Ramchandran, K. 2013. A Case for Ordinal Peer Evaluation SIGOPS Operating Systems Review, 43(2), 117–121. in MOOCs. NIPS Workshop on Data Driven Education. [10] Gehringer, E., Ehresman, L., Conger, S. G., & Wagle, P. Retrieved from 2007. Reusable Learning Objects Through Peer Review: The http://lytics.stanford.edu/datadriveneducation/papers/shahet Expertiza Approach. Innovate: Journal of Online al.pdf Education, 3(5), 4. [24] Slovic, P. 1995. The Construction of Preference. American [11] Gikandi, J. W., Morrow, D., & Davis, N. E. 2011. Online Psychologist, 50(5), 364–371. http://doi.org/10.1037/0003- Formative Assessment in Higher Education: A Review of the 066X.50.5.364 Literature. Computers & Education, 57(4), 2333–2351. [25] Søndergaard, H., Mulder, R. 2012. Collaborative Learning http://doi.org/10.1016/j.compedu.2011.06.004 Through Formative Peer Review: Pedagogy, Programs and [12] Hafner, J., & Hafner, P. 2003. Quantitative Analysis of the Potential, Computer Science Education, December 2012, 1– Rubric as an Assessment Tool: An Empirical Study of 25. Student Peer‐Group Rating. Intl. J. Sci. Educ., 25(12), 1509- [26] Spetzler, C. S., & Stael Von Holstein, C.-A. S. 1975. 1528. Probability Encoding in Decision Analysis. Management [13] Jonsson, A., & Svingby, G. 2007. The Use of Scoring Science, 22(3), 340–358. Rubrics: Reliability, Validity and Educational [27] Tinapple, D., Olson, L., & Sadauskas, J. 2013. CritViz: Consequences. Educational Research Review, 2(2), 130– Web-Based Software Supporting Peer Critique in Large 144. http://doi.org/10.1016/j.edurev.2007.05.002 Creative Classrooms. Bulletin of the IEEE Technical [14] Joordens, S., Desa, S., & Paré, D. 2009. The Pedagogical Committee on Learning Technology, 15(1), 29. Anatomy of Peer Assessment: Dissecting a peerScholar [28] Topping, K. J. 1998. Peer Assessment between Students in Assignment. Journal of Systemics, Cybernetics & Colleges and Universities. Review of Educational Research, Informatics, 7(5). Retrieved from 68(3), 249 –276. http://www.iiisci.org/journal/CV$/sci/pdfs/XE123VF.pdf http://doi.org/10.3102/00346543068003249 [15] Kavanagh, S., & Luxton-Reilly, A. 2016. Rubrics Used in [29] Topping, K. J. 2005. Trends in Peer Learning. Educational Peer Assessment (pp. 1–6). ACM Press. Psychology, 25(6), 631–645. http://doi.org/10.1145/2843043.2843347 http://doi.org/10.1080/01443410500345172 [16] Krosnick, J. A. 1999. Maximizing Questionnaire Quality. In [30] Vista, A., Care, E., & Griffin, P. 2015. A New Approach J. P. Robinson, P. R. Shaver, & L. S. Wrightsman (Eds.), Towards Marking Large-scale Complex Assessments: Measures of Political Attitudes (pp. 37–57). San Diego, CA Developing a Distributed Marking System that Uses an US: Academic Press. Automatically Scaffolding and Rubric-targeted Interface for [17] Krosnick, J. A., Thomas, R., & Shaeffer, E. 2003. How Does Guided Peer-review. Assessing Writing, 24, 1-15. Ranking Rate?: A Comparison of Ranking and Rating Tasks. [31] Wang, H., Dash, D., & Druzdzel, M. J. 2002. A Method for In Conference Papers – American Association for Public Evaluating Elicitation Schemes for Probabilistic Models. Opinion Research (p. N.PAG). IEEE Transactions on Systems, Man, and Cybernetics. Part [18] Liu, C.C., Lu, K.H., Wu, L.Y., & Tsai, C.C. 2016. The B, Cybernetics : A Publication of the IEEE Systems, Man, Impact of Peer Review on Creative Self-efficacy and and Cybernetics Society, 32(1), 38–43. Learning Performance in Web 2.0 Learning Activities. [32] Waters, A., Tinapple, D., & Baraniuk, R. 2015. BayesRank: Journal of Educational Technology & Society, 19(2), 286- A Bayesian Approach to Ranked Peer Grading. In ACM 297. Retrieved from Conference on Learning at Scale, Vancouver. http://www.jstor.org/stable/jeductechsoci.19.2.286 [33] Jigsaw (teaching technique). 2016 June 20. In Wikipedia, [19] Luxton-Reilly, A. 2009. A Systematic Review of Tools That the free encyclopedia. Retrieved from Support Peer Assessment. Computer Science Education, https://en.wikipedia.org/wiki/Jigsaw_(teaching_technique) 19(4), 209–232. http://doi.org/10.1080/08993400903384844 [34] Web annotation. 2016, May 20. In Wikipedia, the free [20] Reddy, Y. M., & Andrade, H. 2010. A Review of Rubric encyclopedia. Retrieved from Use in Higher Education. Assessment & Evaluation in https://en.wikipedia.org/w/index.php?title=Web_annotation Higher Education, 35(4), 435–448. &oldid=721222451 http://doi.org/10.1080/02602930902862859 [35] Yadav, R. K., & Gehringer, E. F. 2016. Metrics for [21] Russell, A. A. 2001. Calibrated Peer Review: A Writing and Automated Review Classification: What Review Data Show. Critical-Thinking Instructional Tool. UCLA, Chemistry, In Y. Li, M. Chang, M. Kravcik, E. Popescu, R. Huang, 2001. Retrieved from http://www.unc.edu/opt- Kinshuk, & N.-S. Chen (Eds.), State-of-the-Art and Future ed/eval/bp_stem_ed/russell.pdf Directions of Smart Learning (pp. 333–340). Springer [22] Schleicher, D. J., Bull, R. A., & Green, S. G. 2008. Rater Singapore. Retrieved from Reactions to Forced Distribution Rating Systems. Journal of http://link.springer.com/chapter/10.1007/978-981-287-868- 7_41