Do we need to defend Assessment from Artificial Intelligence?

Do we need to defend Assessment from Artificial Intelligence?


Michael Henderson – @mjhenderson – Professor of Digital Futures, and Director of the Educational Design and Innovation Research Hub in the Faculty of Education at Monash University.

Over the past months I have been invited to present my thoughts on the challenge and opportunities of AI for assessment design. This short essay is an adaptation of one of these presentations.

Computer KnightAlmost every meeting I have attended for months has, in some way, referred to AI. I shouldn’t really be surprised given that one of my key research areas is in assessment and feedback. However, the conversations are usually polarised – on one hand focusing on issues of cheating, and on the other, focusing on how technologies like AI can enable improved or even novel forms of teaching and learning interactions and practices. Indeed, I am fascinated by how AI simultaneously extends our creative capabilities (as learners, educators and designers) but also challenges our claims to creativity. Interestingly, these concerns about technologies – about circumventing the systems in place, and stimulating novel opportunities – are not new. Read any history of educational technology and you will find similar debates from early days of calculators and television. Nevertheless, just because the debates around AI feel familiar, we shouldn’t discount the significance or potential of the technology.

AI has given the Higher Education sector a jolt – but let’s not kid ourselves to think we are special in this regard. From schools to workplaces – the implications of generated AI are significant. We are forced to consider the difference between the shape of what is valued (the essay, the business strategy report) – and what is valuable (the qualities of the product including how it helps us to achieve solutions and satisfy needs).

In HE many of our assessment designs – from essays to exams – can be tackled by generative AI with outcomes that, at first glance, may seem acceptable or “good enough” (to the novice at least).

In a world of “good enough” – when the language, shape, format, and general collection of ideas, concepts and solutions can all be brought together through AI – we need to remind ourselves about what matters.

Should we be trying to avoid, limit or ignore AI – or will we take up the challenge of saying “good enough” – is simply not good enough.

As educators and educational designers we face a difficult challenge (which can also act as a heuristic for our designs) – If our educators’ evaluative expertise through the application of the assessment criteria and standards cannot distinguish between raw AI outputs compared with what students bring to the task (with or without those AI) – then we have a problem with our assessment designs – not with AI.

There is a tendency to shy away from this deeper challenge – and instead many people are talking about how to defend assessment from AI. We seem to have collectively forgotten, or perhaps more accurately, frightened to acknowledge, that our assessments are simply proxies (and often poor ones at that) of learning. I am not convinced that all of our assessment designs should be, or deserve to be, defended. A huge issue is that many of our assessments do not allow the accurate measurement of learning outcomes. It is a fundamental challenge inherent in all assessment practices.

However, beyond that challenge of valid measurement – we also need to recognise that academic integrity has been an issue for as long as assessment has existed. Current research – depending on who you read and what context you are interested in – cites from 3% to 60% of HE students cheating.

For this reason, I am cautious of calls for us to “go back” to exams and exam halls – as a way to defend assessment against AI.

Exams are not impervious to cheating – this is very much demonstrated in my own research and that of others. More importantly, while exams can sometimes be useful contexts to demonstrate key knowledge – they are often not great as a design for students to demonstrate creative and critical thinking. With this in mind, the call to shift back to exams is worrying –  it is unlikely to stop Generative AI misuse but at the same time it can undermine the integrity of our own educational designs and learning goals. Ironically, this same strategy effectively means we are trying to stop student integrity breaches by breaching our own educational integrity.

However, we cannot simply look the other way. We do need to care greatly about the integrity of assessment – but there seems to be a lot of attention being given to trying to find quick fixes. Ways to exclude or circumvent generative AI use, without fundamentally revisiting what we are trying to assess. Instead we should perhaps be asking ourselves a relatively simple question:

“why should we assess something that AI can do just as well or better?”

This question is problematic but it is useful as a provocation to help us focus on considering if and how we can assess what the human brings to the task. This may – for example – be creative thinking, contextually relevant problem solving, empathy, coherence of purposeful argumentation, etc.

The implication is that we could or should focus less on the work of summarising information, word lengths, report formats, grammatical structures, and expression. All of which generative AI can do – and will increasingly become proficient in. If we do want to assess these things, then it is arguably more appropriate to focus on the judgments of students in terms of what they believe is good quality. Perhaps it is less about who (or what) created the sentence, and more about the student’s skill in deciding that the sentence (or whatever is being produced) meets the criteria and satisfies other judgements of quality.

Indeed, rather than focusing on the shape of the thing, let’s focus more on creative assessment designs – that seek to elicit evidence of learning that matters.

So what matters?

Simply put: to be able to make judgements of quality

This may look different according to learning goals, disciplines and curriculum contexts. For example, the issue of quality may be in relation to recognising the characteristics of effective, efficient, accurate, factual, novel, defensible, values-based, or ethical, responses to a problem or task.

The problem with generative AI is that it applies structures and content – often independently. Resulting in material that we need to question in terms of authority, accuracy, quality, appropriateness, and creativity. These are complex, critical thinking skills.

So when we talk about the use of AI in assessment let’s keep this mind – are we assessing things that AI can do easily? Or, are we assessing the uniqueness of human processing?

There has been a lot of talk about how we might incorporate (rather than exclude AI) into our assessment designs.

Many institutions have – quite rightly – recognised the potential value of generative AI, as well as the futility of trying to exclude it. It is becoming common to hear that HE institutions are including AI clauses in their assessment policies – in which it is acceptable and in some cases even encouraged.

This is exciting. There are many ways Gen AI can be used.

  • First, we can look at our assessment designs and consider what labours are involved that can be offloaded or co-constructed with AI. For example – the communication or summary of well-established ideas.
  • Second, we can use generative AI to support and prompt learners to use established structures, overcome writers block, return/collate/suggest materials to stimulate thinking.
  • Third, we can look at how we might use generative AI to inspire novel thinking, test ideas, explore hypotheses, quickly generate material and become a sounding board. We could ask Gen AI to produce comparative texts, alternative responses, or to suggest if there are other arguments – to be a devil’s advocate.

But let’s be cautious in our optimism.

It is easy for us to paint an overly romantic picture of how we might use Generative AI in the process of assessments. You don’t have to go far to engage in conversations that variously frame AI as a benevolent mentor, a generous co-writer, or as a dutiful assistant… but the application of generative AI in learner assessments requires considerable skill and arguably new forms of literacy.

Some have called this prompt engineering – knowing how to skilfully prompt and guide the AI to the production of desirable outputs. However, this makes the skill sound technical in nature. Simply knowing how to prompt the system is not enough. The fundamental challenge for students (and educators) is to know when something is good in quality – to know when the prompting is sufficient, what is lacking, what needs improvement and importantly, when it is not appropriate.

This ability to make judgements of quality essentially requires students (and educators) to have a degree of

  • domain specific knowledge to enable them to recognise errors, lack of depth, etc.
  • critical thinking skills to enable them to make judgements of quality
  • depth of understanding about the nature of the problem that is being tackled in order to know when something is more than simply passable

So to come full circle – everyone wants to know what creative strategies we can employ to defend assessment from an AI world – but I suggest we should instead be thinking about designing assessment for an AI world.

Product is a poor proxy of learning – process is perhaps more revealing. The challenge then is to develop assessment designs that can help reveal the process: the thinking, the strategies, and the evaluative choices. At the simplest level – assessment designs that ask students to explain, extend, contest, use or defend their solution to a task are likely to be a good starting point.

The above challenge is significant. In HE there is a large proportion of teaching staff who are sessional or out of field and thereby less able to discern the differences in quality outputs. If we are faced with a sea of “good enough”, then we need to consider how we can refine our criteria to more accurately differentiate between the quality of student performance. However, simply because this is a challenge does not mean we should give up. There are valuable reasons for the use of Gen AI in assessment designs including equity.

We need to be cautious of knee jerk reactions that may damage the progress we have made over recent decades in inclusive learning and assessment designs. The move to find ways to exclude or limit the use of AI in assessment, such as the calls for more exams as a way to defend against AI, can in turn necessarily means that we lose affordances of AI.

A case for inclusion and equity: Generative AI may provide opportunities for students who are differently able, to engage in ways they may have otherwise found more difficult. Gen AI can enable and support students who have lower literacy skills, operate in languages other than English, or suffer from chronic illnesses. They can use generative AI to help them work through the barriers of production – to allow them more time to engage in the valued skills of critical thinking and evaluation.

In this respect there is the potential of AI to be a leveller – an exciting opportunity for us to consider how we can use generative AI to increase access for all learners, and enable them to engage in authentic assessments with less impediments.

On the other hand – generative AI can propagate and reinforce inequity – this raises two issues for me:

  • First, I can easily see that Generative AI services will inevitably become a subscription service – with the highest rates connected with the most powerful services – the most natural language outputs, most effective or comprehensive source databases, personalised responses, context aware, etc. The costs of such subscriptions may simply result in a continued socio-economic divide in educational outcomes.
  • Second, Generative AI is an echo chamber – albeit a very big one. It does not know the difference between fact or lie. It does not know the implications of biased language, stereotype representations. It is not – in its current form – values driven – except for the embedded values of the programmers and the large language model it draws on. In this echo chamber – we can see the reinforcement and repetition of exclusionary, harmful and unethical ideas.

A few concluding thoughts:

Rather than defend assessment from AI, let’s re-visit what we are trying to assess. Let’s use AI to support inclusion, enable opportunities, and challenge students to demonstrate their critical awareness of issues of quality. The most creative assessment design – is perhaps to focus less on the production of content (e.g. creation of an essay) and instead challenge students to evaluate, correct, extend and improve upon the outputs of Generative AI for specific authentic purposes.

– 16 May 2023 –

Prof Michael Henderson – Professor of Digital Futures, and Director of the Educational Design and Innovation Research Hub in the Faculty of Education at Monash University. Michael has a broad range of interests – all of which – unsurprisingly are being shaped, challenged or enhanced by AI developments – these include: (a) student creativity and creative risk taking, (b) innovative educational designs including assessment and feedback designs.