Rater effects on essay scoring: A multilevel analysis of severity drift, central tendency and rater experience

Authors
Leckie, G. and Baird, J.-A.
Year
2011
Journal
Journal of Educational Measurement, 48:4, 399-418
DOI
10.1111/j.1745-3984.2011.00152.x
Abstract

This study examined rater effects on essay scoring in an operational monitoring system from England's 2008 national curriculum English writing test for 14 year olds. We fitted two multilevel models and analysed: (1) drift in rater severity effects over time; (2) rater central tendency effects; and (3) differences in rater severity and central tendency effects by raters' previous rating experience. We found no significant evidence of rater drift and while raters with less experience appeared more severe than raters with more experience, this result was also not significant. However, we did find that there was a central tendency to raters' scoring. We also found that rater severity was significantly unstable over time. We discuss the theoretical and practical questions that our findings raise.

Number of levels
3
Model data structure
Response types
Multivariate response model?
No
Longitudinal data?
Yes
Substantive discipline
Substantive keywords
Paper submitted by
George Leckie, Graduate School of Education, University of Bristol, g.leckie@bristol.ac.uk
Edit this page