Comparative judgement

Quicker and more reliable? Surely not . . .

Does ‘comparative judgement’ provide a more reliable – and quicker way to mark written work? Daisy Christodoulou certainly thinks so.

Why is it so hard to assess writing?

When I taught secondary English, I always found grading essays hard. To begin with, I thought this was because I was a new teacher, but I soon realised that even experienced teachers struggled. Moderation meetings were often interesting, sometimes argumentative, but never seemed to provide grading consistency in which I could have confidence.

Research

The research backs up my experiences. The traditional method of assessing written tasks like narratives and essays is for markers to grade each piece of writing using a rubric or mark scheme. This method has been widely used for decades in many different countries. The problem is that it is quite hard to get people to agree on how the rubric should be interpreted. Even with lots of training and experience, you will still find markers disagreeing on the grade a script deserves.

In fact, some research shows that markers disagree with themselves on the grade a script deserved when asked to mark it at two different times![i] And the levels of disagreement can be quite startling. Recent research by the UK based regulator, Ofqual (the Office for Qualifications and Examinations Regulation) showed that on a typical 40-mark exam question, markers will disagree by about +/- 5 marks. That is, one marker might give an essay 15 out of 40 and another might give it 25 and they could both be within the margin of error.[ii]

A new alternative: comparative judgement

Comparative Judgement is a different way of assessing writing. Instead of using a mark scheme, teachers read two pieces of writing and decide which of the two they think is the better piece of writing. Each teacher will make a series of these judgements, and many different teachers will participate in judging. Then, these judgements are combined together to place every piece of writing on a consistent measurement scale. If it’s necessary, grades can be layered on top of this scale.

Too subjective – surely?

This process may seem very subjective, but the data show that it’s much more reliable and quicker than traditional marking. The typical margin of error is just +/- 1 or 2 marks, and judges are typically able to get through their judging quota in about half the time it takes to mark a set of essays traditionally.[iii]

Why it works

Why does Comparative Judgement work? It is because of an important psychological principle, which is, that humans are innately much better at comparative judgements than absolute ones. This is not just true of marking essays, either, but of all kinds of absolute judgement. For example, if you are given a shade of blue and asked to identify how dark a shade it is on a scale of 1 to 10, or given a line and asked to identify the exact length of it, you will probably struggle to be precisely right.

However, if you are given two shades of blue and asked to find the darker one, or two lines, and asked to find the longer one, you will find that much easier. Absolute judgement is hard; comparative judgement is much easier, but traditional essay marking works mainly on the absolute model.^[iv]

Find out more

At No More Marking, where I am Director of Education, we have used Comparative Judgement to assess nearly a million pieces of writing over the last five years. We run projects in England & Wales, Australia and the US, and we also have lots of British International schools who take part in our England & Wales project and receive results that are standardised against the student cohort in England.

As well as the gains in reliability and efficiency, we’ve also used Comparative Judgement to improve the way writing is taught. Our collection of real student writing has allowed us to identify the aspects of writing students find particularly difficult, and to design teaching resources that focus on improving their understanding.

Daisy Christodoulou is the author of several books, including Seven Myths About Education. She is Director of Education at No More Marking.

Feature Image: by Iulian Ursache from Pixabay

Examples of work & graphics kindly provided by Daisy.

[i] See for example Meadows, Michelle, and Lucy Billington. “A review of the literature on marking reliability.” London: National Assessment Agency (2005)

[ii] Ofqual, Marking consistency metrics, 2016. https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/681625/Marking_consistency_metrics_-_November_2016.pdf

[iii] Ofqual, Marking consistency metrics, an update, 2018, p. 27. https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/759207/Marking_consistency_metrics_-_an_update_-_FINAL64492.pdf

[iv] Laming, Donald. Human judgment: the eye of the beholder. Cengage Learning EMEA, 2003.

Your cart

Contact Us

Quicker and more reliable? Surely not . . .

Why is it so hard to assess writing?

Research

A new alternative: comparative judgement

Too subjective – surely?

Why it works

Find out more

Your cart

Quicker and more reliable? Surely not . . .

Why is it so hard to assess writing?

Research

A new alternative: comparative judgement

Too subjective – surely?

Why it works

Find out more

Your cart

Subscribe to ITM