Peer assessment

Peer assessment, or self-assessment, is a process whereby students or their peers grade assignments or tests based on a teacher’s benchmarks.[1] The practice is employed to save teachers time and improve students' understanding of course materials as well as improve their metacognitive skills. Rubrics are often used in conjunction with Self- and Peer-Assessment.[2]

Advantages of self and peer assessment

Saves teachers' time

Student grade assignments can save teacher's time[3] because an entire classroom can be graded together in the time that it would take a teacher to grade one paper. Moreover, rather than having a teacher rush through each paper, students are able to take their time to correct them. Students can spend more time on a paper because they only have to grade one and can therefore do a more thorough job.[4]

Faster feedback

Having students grade papers in class or assess their peers' oral presentations decreases the time taken for students to receive their feedback. Instead of them having to wait for feedback on their work, self- and peer-assessment allow assignments to be graded soon after completion. Students then don't have to wait until they have moved onto new material and the information is no longer fresh in their minds.[5]

The faster turnaround time of feedback has been also shown to increase the likelihood of adoption by the feedback recipient. A controlled experiment conducted in a Massive Open Online Course (MOOC) setting found that students' final grades improved when feedback was delivered quickly, but not if delayed by 24 hours.[6]

Pedagogical

Teacher's evaluation role makes the students focus more on the grades not seeking feedback.[7] Students can learn from grading the papers[5] or assessing the oral presentations of others. Often, teachers do not go over test answers and give students the chance to learn what they did wrong. Self and peer assessment allow teachers to help students understand the mistakes that they have made. This will improve subsequent work and allow students time to digest information and may lead to better understanding.[8] A study by Sadler and Good found that students who self-graded their tests did better on later tests. The students could see what they had done wrong and were able correct such errors in later assignments. After peer grading, students did not necessarily achieve higher results.[9]

Metacognitive

Through self- and peer-assessment students are able to see mistakes in their thinking and can correct any problems in future assignments. By grading assignments, students may learn how to complete assignments more accurately and how to improve their test results.[5]

Professors Lin-Agler, Moore, and Zabrucky conducted an experiment in which they found “that students are able to use their previous experience from preparing for and taking a test to help them build a link between their study time allocation.”[10] Students can not only improve their ability to study for a test after participating in self- and peer- assessment but also enhance their ability to evaluate others through improved metacognitive thinking.[11]

Attitude

If self- and peer-assessment are implemented, students can come to see tests not as punishments but as useful feedback.[11] Hal Malehorn says that by using peer evaluation, classmates can work together for “common intellectual welfare” and that it can create a “cooperative atmosphere” for students instead of one where students compete for grades.[12] In addition, when students assess the works of their fellow students, they also reflect on their own works. This reflective process stimulates action for improvement.[13]

However, in the Supreme Court Case Owasso Independent School District v. Falvo, the school was sued following victimization of an individual after other students learned that he had received a low test score.[14] Malehorn attempts to show what the idealized version of peer-assessment can do for classroom attitude. In practice, situations where students are victimized can result as seen in the Supreme Court Case.

Teacher grading agreement

One concern about self- and peer-assessment is that students may give higher grades than teachers. Teachers want to reduce grading time but not at the cost of losing accuracy.[15]

Support

A study by Saddler and Good has shown that there is a high level of agreement between grades assigned by teachers and students as long as students are able to understand the teacher's quality requirements. They also report that teacher grading can be more accurate as a result of using self- and peer-assessment. If teachers look at how students grade themselves, then they have more information available from which to assign a more accurate grade.[16]

Opposition

However, Saddler and Good warn that there is some disagreement. They suggest that teachers implement systems to moderate grading by students in order to catch unsatisfactory work.[16] Another study reported that grade inflation did occur as students tended to grade themselves higher than a teacher would have. This would suggest that self- and peer-assessment are not an accurate method of grading due to divergent results.[17]

Comparison

According to the study by Saddler and Good, students who peer grade tend to undergrade and students who are self graded tend to overgrade. However, a large majority of students do get within 5% of the teacher’s grade. Relatively few self graders undergrade and relatively few peer graders tend to overgrade.[15]

Perhaps one of the most prominent models of peer-assessment can be found in design studios.[18][19] One of the benefits of such studios comes from structured contrasts which can help novices notice differences that might otherwise have been accessible only for experts.[20] In fact, it is a well known strategy for designers to use comparisons to get inspired.[21][22] Some researchers designed systems that support comparative examples to surface helpful comparisons in educational settings.[23][24][25] However, what makes a good comparison remains unclear; the general guidance of good feedback by Sadler describes three characteristics: specific, actionable, and justified,[26] and has widely been adopted in feedback research. However, with each piece of work to be evaluated differing so vastly in content, the path towards those qualities in a specific feedback performance remains largely unknown. Effective feedback is not only written actionably, specifically, and in a justified manner, but more importantly, contains good content; good in the sense that it points out relevant things, brings in new insights, and changes the minds of its recipients to consider the problem from a different angle, or re-represent it completely. This requires content-specific customization.

Rubrics

Purpose

Students need guidelines to follow before they are able to grade more open ended questions. These often come in the form of rubrics, which lay out different objectives and how much each is worth when grading.[11] Rubrics are often used for writing assignments.[27]

Examples of objectives

Expression of ideas
Organization of content
Originality
Subject knowledge
Content
Curriculum alignment
Balance
Voice

Group work

One area in which self- and peer-assessment is being applied is in group projects. Teachers can give projects a final grade but also need to determine what grade each individual in the group deserves. Students can grade their peers and individual grades can be based on these assessments. Nevertheless, there are problems with this grading method. If students grade each other unfairly, the overall grades skew in different directions.[28]

Overgenerosity

Some students may give all the other students remarkably high grades which will cause their scores to be lower compared to the others. This can be addressed by having students grade themselves and thus their generosity will also extend to themselves and raise their grades by the same amount. However, this does not compensate for students who feel they did not work at their best, and self-grade themselves too harshly.[29]

Creative accounting

Some students will award everybody low marks and themselves exceedingly high marks to bias the data. This can be countered by checking student’s grades and making sure that they are consistent with where in the group their peers graded them.[30]

Individual penalization

If all of the students go against one student because they feel that the individual did little work, then they will receive an exceptionally low grade. This is permissible if the student in question really did do truly little work but may require the instructor's intervention before it ends up as the final result.[30]

Classroom participation

While it is difficult to grade students on participation in a classroom setting because of its subjective nature, one method of grading participation is to use self- and peer-assessment. Professors Ryan, Marshall, Porter, and Jia conducted an experiment to see if using students to grade participation was effective. They found that there was a difference between a teacher's evaluation of participation and a student's. However, there was no academic significance, indicating that student's final grades were not affected by the difference in a teacher's evaluation and a student's. They concluded that self- and peer-assessment is an effective way to grade classroom participation.[31]

Peer-assessment at scale

The peer-assessment mechanism is also the gold-standard in many creative tasks varied from reviewing the quality of scholarly articles or grant proposals to design studios. However, as the number of assessments to be done increases, challenges arise. One is that because no one providing assessment has a global understanding of the entire pool of submissions, local biases in judgment may be introduced (e.g. the range of a scale used to assess may be affected by the pool of submissions the assessor reviews) and noises in the ranking aggregated from individual peer-assessment may be added. On the other hand, because the ranked outcome is of utmost interest in many situations (e.g. allocating research grants to proposals or assigning letter grades to students), ways to systematically aggregate peer-wise assessment to recover the ranked order of submissions has many practical implications.

To tackle this, some researchers studied (1) evaluation schemes (e.g. ordinal grading,[32] (2) algorithms to aggregate pairwise evaluation to more robustly estimate the global ranking of submissions,[33] and (3) produce more optimal pairs to exchange feedback either by considering conflicts of interest[34] or (4) by modeling a framework that reduces the error between individual- and community-level judgment on the value of a scholarly article.[35]

Legality

The legality of self- and peer-Assessment was challenged in the United States Supreme Court case of Owasso Independent School District v. Falvo. Kristja Falvo sued the school district where her son attended school because it used peer-assessment and he was teased about a low score. The teacher's right to use self- and peer-assessment was upheld by the court.[36]

Notes

Sadler, Philip M., and Eddie Good The Impact of Self- and Peer-Grading on Student Learning p.2
Malehorn, Hal Ten measures better than grading p.323
Searby, Mike, and Tim Ewers An evaluation of the use of peer assessment in higher education: A case study in the School of Music p.371
Sadler, Philip M., and Eddie Good The Impact of Self- and Peer-Grading on Student Learning p.2
Sadler, Philip M., and Eddie Good The Impact of Self- and Peer-Grading on Student Learning p.2
Kulkarni, Chinmay E., Michael S. Bernstein, and Scott R. Klemmer. "PeerStudio: rapid peer feedback emphasizes revision and improves performance." Proceedings of the second (2015) ACM conference on learning@ scale. ACM, 2015.
J. Scott Armstrong (2012). "Natural Learning in Higher Education". Encyclopedia of the Sciences of Learning.
Ngar-Fun, Liu, and David Carless Peer feedback: the learning element of peer assessment p.281
Sadler, Philip M., and Eddie Good The Impact of Self- and Peer-Grading on Student Learning p.24
Lin-Agler, Lin Miao, DeWayne Moore, and Karen M. Zabrucky EFFECTS OF PERSONALITY ON METACOGNITIVE SELF-ASSESSMENTS p.461
Sadler, Philip M., and Eddie Good The Impact of Self- and Peer-Grading on Student Learning p.3
Malehorn, Hal Ten measures better than grading p.323
Kristanto, Yosep Dwi (2018). "Technology-enhanced pre-instructional peer assessment: Exploring students' perceptions in a Statistical Methods course". Research and Evaluation in Education. 4 (2): 105–116. arXiv:2002.04916. doi:10.21831/reid.v4i2.20951. S2CID 149711864.
Sadler, Philip M., and Eddie Good The Impact of Self- and Peer-Grading on Student Learning p.1
Sadler, Philip M., and Eddie Good The Impact of Self- and Peer-Grading on Student Learning p.16
Sadler, Philip M., and Eddie Good The Impact of Self- and Peer-Grading on Student Learning p.23
Strong, Brent, Mark Davis, and Val Hawks SELF-GRADING IN LARGE GENERAL EDUCATION CLASSES p.52
Dannels, Deanna P., and Kelly Norris Martin. "Critiquing critiques: A genre analysis of feedback across novice to expert design studios." Journal of Business and Technical Communication 22.2 (2008): 135-159.
Goldschmidt, Gabriela, Hagay Hochman, and Itay Dafni. "The design studio “crit”: Teacher–student communication." AI EDAM 24.3 (2010): 285-302.
Schwartz, Daniel L., Jessica M. Tsang, and Kristen P. Blair. The ABCs of how we learn: 26 scientifically proven approaches, how they work, and when to use them. WW Norton & Company, 2016.
Herring, Scarlett R., et al. "Getting inspired!: understanding how and why examples are used in creative design practice." Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 2009.
Newman, Mark W., and James A. Landay. "Sitemaps, storyboards, and specifications: a sketch of Web site design practice." Proceedings of the 3rd conference on Designing interactive systems: processes, practices, methods, and techniques. ACM, 2000.
Cambre, Julia, Scott Klemmer, and Chinmay Kulkarni. "Juxtapeer: Comparative peer review yields higher quality feedback and promotes deeper reflection." Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. ACM, 2018.
Kang, Hyeonsu B., et al. "Paragon: An Online Gallery for Enhancing Design Feedback with Visual Examples." Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. ACM, 2018.
Potter, Tiffany, et al. "ComPAIR: A New Online Tool Using Adaptive Comparative Judgement to Support Learning with Peer Feedback." Teaching & Learning Inquiry 5.2 (2017): 89-113.
Sadler, D. Royce. "Formative assessment and the design of instructional systems." Instructional science 18.2 (1989): 119-144.
Andrade, Heidi, and Ying Du Student responses to criteria-referenced self-assessment p.287
Li, Lawrence K. Y. Some Refinements on Peer Assessment of Group Projects p.5
Li, Lawrence K. Y. Some Refinements on Peer Assessment of Group Projects p.8
Li, Lawrence K. Y. Some Refinements on Peer Assessment of Group Projects p.9
Ryan, Gina J., et al. Peer, professor and self-evaluation of class participation p.56
Raman, Karthik, and Thorsten Joachims. "Methods for ordinal peer grading." Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2014.
Chen, Xi, et al. "Pairwise ranking aggregation in a crowdsourced setting." Proceedings of the sixth ACM international conference on Web search and data mining. ACM, 2013.
Kotturi, Yasmine, et al. "Rising above Conflicts of Interest: Algorithms and Interfaces to Assess Peers Impartially." 2013.
Noothigattu, Ritesh, Nihar B. Shah, and Ariel D. Procaccia. "Choosing How to Choose Papers." arXiv preprint arXiv:1808.09057 (2018).
Sadler, Philip M., and Eddie Good The Impact of Self- and Peer-Grading on Student Learning p.9

References

Andrade, Heidi, and Ying Du "Student responses to criteria-referenced self-assessment." Assessment & Evaluation in Higher Education 32.2 (2007): 159–181.
Gopinath, C. "Alternatives to Instructor Assessment of Class Participation." Journal of Education for Business 75.1 (1999): 10.
Li, Lawrence K. Y. "Some Refinements on Peer Assessment of Group Projects." Assessment & Evaluation in Higher Education 26.1 (2001): 5–18.
Lin-Agler, Lin Miao, DeWayne Moore, and Karen M. Zabrucky "EFFECTS OF PERSONALITY ON METACOGNITIVE SELF-ASSESSMENTS." College Student Journal 38.3 (2004): 453–461.
Malehorn, Hal "Ten measures better than grading." Clearing House 67.6 (1994): 323.
Mok, Magdalena Mo Ching, et al. "Self-assessment in higher education: experience in using a metacognitive approach in five case studies." Assessment & Evaluation in Higher Education 31.4 (2006): 415–433.
Ngar-Fun, Liu, and David Carless "Peer feedback: the learning element of peer assessment." Teaching in Higher Education 11.3 (2006): 279–290.
Ryan, Gina J., et al. "Peer, professor and self-evaluation of class participation." Active Learning in Higher Education 8.1 (2007): 49–61.
Sadler, Philip M., and Eddie Good "The Impact of Self- and Peer-Grading on Student Learning." Educational Assessment 11.1 (2006): 1–31.
Searby, Mike, and Tim Ewers "An evaluation of the use of peer assessment in higher education: A case study in the School of Music" Assessment & Evaluation in Higher Education 22.4 (1997): 371.
Strong, Brent, Mark Davis, and Val Hawks "SELF-GRADING IN LARGE GENERAL EDUCATION CLASSES." College Teaching 52.2 (2004): 52–57.
van den Berg, Ineke, Wilfried Admiraal, and Albert Pilot "Peer assessment in university teaching: evaluating seven course designs." Assessment & Evaluation in Higher Education 31.1 (2006): 19–36.
Raman, Karthik, and Thorsten Joachims. "Methods for ordinal peer grading." Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2014.
Chen, Xi, et al. "Pairwise ranking aggregation in a crowdsourced setting." Proceedings of the sixth ACM international conference on Web search and data mining. ACM, 2013.
Kotturi, Yasmine, et al. "Rising above Conflicts of Interest: Algorithms and Interfaces to Assess Peers Impartially." 2013.
Noothigattu, Ritesh, Nihar B. Shah, and Ariel D. Procaccia. "Choosing How to Choose Papers." arXiv preprint arXiv:1808.09057 (2018).
Kulkarni, Chinmay E., Michael S. Bernstein, and Scott R. Klemmer. "PeerStudio: rapid peer feedback emphasizes revision and improves performance." Proceedings of the second (2015) ACM conference on learning@ scale. ACM, 2015.
Dannels, Deanna P., and Kelly Norris Martin. "Critiquing critiques: A genre analysis of feedback across novice to expert design studios." Journal of Business and Technical Communication 22.2 (2008): 135-159.
Schwartz, Daniel L., Jessica M. Tsang, and Kristen P. Blair. The ABCs of how we learn: 26 scientifically proven approaches, how they work, and when to use them. WW Norton & Company, 2016.
Herring, Scarlett R., et al. "Getting inspired!: understanding how and why examples are used in creative design practice." Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 2009.
Newman, Mark W., and James A. Landay. "Sitemaps, storyboards, and specifications: a sketch of Web site design practice." Proceedings of the 3rd conference on Designing interactive systems: processes, practices, methods, and techniques. ACM, 2000.
Cambre, Julia, Scott Klemmer, and Chinmay Kulkarni. "Juxtapeer: Comparative peer review yields higher quality feedback and promotes deeper reflection." Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. ACM, 2018.
Kang, Hyeonsu B., et al. "Paragon: An Online Gallery for Enhancing Design Feedback with Visual Examples." Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. ACM, 2018.
Potter, Tiffany, et al. "ComPAIR: A New Online Tool Using Adaptive Comparative Judgement to Support Learning with Peer Feedback." Teaching & Learning Inquiry 5.2 (2017): 89-113.
Sadler, D. Royce. "Formative assessment and the design of instructional systems." Instructional science 18.2 (1989): 119-144.
Goldschmidt, Gabriela, Hagay Hochman, and Itay Dafni. "The design studio “crit”: Teacher–student communication." AI EDAM 24.3 (2010): 285-302.