Every few weeks, another university releases a new AI policy. Detection tools are being procured. Academic integrity committees are meeting more frequently than ever. Turnitin has pivoted its entire product roadmap.
And underneath all of this activity is a question that almost nobody is asking: what if we’re solving the wrong problem?
The sector’s dominant response to generative AI has been defensive. Detect it. Deter it. Discipline students who use it without disclosure. I understand the instinct — academic integrity matters, and the pressure on institutions to do something is real. But in our rush to defend existing assessment practices, we’ve skipped over a more important question: are those practices worth defending in the first place?
What assessment is actually for
Assessment in higher education is supposed to do one thing: generate valid evidence that a student has developed the capability the program claims to develop. That’s it. Everything else — the essay format, the word count, the exam conditions, the submission deadline — is infrastructure. It’s the mechanism we’ve built to serve that purpose.
The problem is that over time, in many programs, the infrastructure became the point. We got very good at designing assessments that are administratively manageable, academically defensible, and easy to mark consistently. What we got less good at is asking whether they actually tell us what we need to know about what a student can do.
AI has made this problem impossible to ignore. When a language model can produce a competent 2,000-word essay on almost any topic in thirty seconds, the essay stops being reliable evidence of the student’s capability. It becomes evidence of the student’s ability to prompt an AI — which may or may not correlate with the capability you’re actually trying to develop.
This isn’t a reason to ban AI. It’s a reason to redesign the assessment.
The question underneath the question
What I keep coming back to is this: if an AI can complete your assessment, what does that tell you about the assessment?
In some cases, it tells you that the task was always too easily separable from genuine understanding — that it was testing recall or surface-level synthesis rather than the kind of integrated, applied capability that actually transfers to a workplace or a life. In those cases, AI hasn’t broken the assessment. It’s revealed that the assessment was already broken.
In other cases, it tells you something more interesting: that the evidence of capability you were collecting was always a proxy, and that proxy is now unreliable. You weren’t testing whether students could think — you were testing whether they could produce a particular kind of document. The document and the thinking used to be correlated enough that the proxy worked. Now they’re not.
Neither of these is a comfortable conclusion for institutions to sit with. But they’re the honest ones.
What a better question looks like
Instead of “how do we detect AI use?”, the more productive question is: “what would constitute valid, observable evidence that this student has actually developed this capability?”
Sometimes the answer is oral assessment — a conversation with an examiner that can’t be outsourced to a language model. Sometimes it’s portfolio-based evidence accumulated over time, where the process of development is as visible as the product. Sometimes it’s work-integrated assessment, where the evidence is generated in a real professional context. Sometimes it’s a redesigned written task that requires the student to apply reasoning to a genuinely novel problem — one that rewards the kind of thinking AI currently can’t replicate.
None of these are new ideas. Learning designers have been advocating for authentic, capability-centred assessment for years. What’s new is that AI has removed the option of not taking this seriously.
A note on equity
One thing that gets lost in the integrity conversation is that authentic assessment reform, done well, is also an equity intervention. High-stakes, time-pressured, decontextualised exams have always disadvantaged students from particular backgrounds — students who are first-generation university attendees, students with learning differences, students who are working full-time alongside their studies. Assessment designs that generate richer, more contextualised evidence of capability tend to be more equitable as well as more valid.
This is worth naming, because the pressure to crack down on AI use can easily tip into a pressure to revert to the most surveillance-heavy, controlled assessment conditions possible. That response has real costs for students who were already disadvantaged by those conditions.
What I’d like to see
I’d like to see institutions spend at least as much energy redesigning assessment as they spend on detection. I’d like to see the AI integrity conversation happen in the same room as the learning design conversation, rather than in separate committees with separate briefs. And I’d like to see the sector treat this moment as an invitation — because that’s genuinely what it is.
AI hasn’t created the problem. It’s accelerated a reckoning that was already overdue.
The question was never “did the student use AI?” The question has always been “do we actually know what this student can do?” We just didn’t have to answer it honestly until now.
Meg Knight is Director of Learning & Operations (International) at Online Education Services (OES). She writes about online education, learning design, and the future of higher education.
Leave a comment