Adaptive Systems, Adapted Children
Who gets human judgement in an automated system
You’re waiting for surgery.
A doctor appears with a consent form for a new procedure.
It might help, they say.
The research is preliminary. The mechanism isn’t fully understood. It hasn’t yet been tested on people like you, your demographic, your profile. But the efficiency gains look promising, and a trial needs participants.
You ask what “help” means. Compared to what? Measured how? And what happens if it doesn’t work?
You would not proceed without answers.
The announcement
The government has announced today that AI tutoring could benefit 450,000 disadvantaged pupils.
The tools will “adapt to individual pupils’ needs”, “provide extra help when they get stuck” and “identify where they need more practice to master their lessons”. The goal is for pupils to “catch up with their peers”.
This sounds uncontroversial…Benevolent, even. Extra help, tailored support, closing gaps. Especially for those who are disadvantaged. Who would object to that?
But this does raise an important question we should be asking ourselves.
Do we want technology that adapts to children, or children who must adapt to the technology?
The answer is to that is determined by design.
The announcement promises tools will be “co-created with teachers” and “robustly tested” for safety. But once the underlying model of learning is fixed, co-creation and testing operate within it, not on it.
This is not a new concern. For years, the AI in Education research community has documented how systems trained on typical learning trajectories handle difference not by adapting, but by normalising it away. As Kaska Porayska‑Pomsta (2024) cautions, when learning diverges from what a system is built to recognise, the system does not adapt, it stabilises itself. Who is the cost of that stability borne by? The learner, particularly where learning is least predictable.
The system does not shape to the learner.
The learner is shaped to fit the system.
Those warnings do not appear here. The announcement assumes the questions have been answered. That we know what learning is, what progress means, what support serves it. That scale and efficiency are the remaining problems, not epistemology or ethics.
“Help” is not a neutral category. Get those assumptions of support wrong (or worse, leave them unexamined to hit the six month deadline you’ve set yourself) and you don’t get efficiency at scale. You get the wrong thing, delivered consistently, to those with the most at stake with the least capacity to object.
What teaching requires
A child says “twelve.”
Did they count? Did they remember? Did they guess? Are they consolidating something solid or repeating something half-understood? The answer itself tells you almost nothing. What matters is what is hidden.
This is what teaching is: trying to see what you cannot directly observe going on inside the head of a four-year-old. You watch for hesitation. You notice tone. Gestures. Expression. You remember what worked last week and what didn’t. You’re likely wrong constantly. But you can sense when you’re wrong, and that doubt itself becomes information. You check. You adapt.
What AI cannot do
AI systems also interpret learners, but they tend to do so through what can be measured instantly: right answers, wrong answers, response times, error repetition. From this they build a model and act on it. If the model is uncertain, that uncertainty is resolved algorithmically. The system picks a path and continues.
It cannot do what human interpretation does: stop because something feels off. Question its own reading. Wait because maybe the struggle is necessary. Revise its understanding. These moves depend on human judgement, and on accepting that you might be wrong about what is happening in order to adapt.
And this is the problem. What matters most may resist simple measurement.
Some of the most important features of learning do not register as progress in the moment. Performance can dip before it improves. Understanding may emerge through confusion. Agency develops by learning to stay with difficulty, not by having it removed.
Systems designed to optimise measurable performance (without a human in the loop) don’t really have way to distinguish these processes from error. What they cannot represent, they resolve. In doing so, they stabilise performance by smoothing away forms of struggle that are not obstacles to learning, but part of it.
AI tutoring may improve performance. But what kind of performance? And is that performance what a human would recognise, simply, as learning?
The uneven distribution
And for whom does this definition of learning become acceptable for?
Because educational inequality, really, is not only about access. It is about what becomes acceptable for different children. When we decide that automated support is an acceptable substitute for human judgement for disadvantaged children, we aren’t really closing gaps. We’re re-drawing the baseline. We are saying that some children are entitled to human attention as standard, whilst others are expected to make do with systems designed to approximate it.
The problem truly is not the presence of technology, but the way its limits are assigned to particular children. When a tool encodes a particular philosophy of learning, it also determines who is expected to live with what it cannot do. And when those limits concern judgement, interpretation, and care, the work at the heart of learning, well, the consequences are not shared evenly.
What the evidence can and cannot tell us
It is tempting, when responding to announcements like this, to divide the world neatly into optimism and scepticism. Either technology will transform learning, or, at best, it will disappoint. Get with the times, or be a luddite.
Decades of research on educational apps suggest that digital tools can support learning. But they do so conditionally, unevenly, and within bounds.
When positive effects are found, they tend to be tightly coupled to the specific skills the app targets. Practice improves performance on practiced tasks. Number games strengthen number skills. Pattern drills improve speed and accuracy. This is not trivial, and for some learners it can be genuinely helpful. But it is largely a story of near transfer: improvement that stays close to the original activity.
Evidence for broader generalisation is thinner. Gains in conceptual understanding, problem-solving, or long-term retention are less consistently observed, and when they do appear, they are often sensitive to design choices that are easy to overlook in policy discourse. Explanatory feedback matters more than praise. Structured progression matters more than surface personalisation. Integration into classroom routines matters more than the app itself.
In other words, how the technology is designed and used matters at least as much as whether it is used.
This should give pause to claims framed primarily in terms of reach. Scale does not amplify all mechanisms equally. What scales most reliably are forms of practice and correction. What scales poorly are judgement, interpretation, and responsiveness to context.
There is also the question of time. Much of the evidence for educational apps relies on outcomes measured immediately after an intervention. Fewer studies examine whether gains persist months or years later, and those that do report mixed results. Some effects endure. Others fade. Some disappear entirely once the novelty or structure of the intervention is removed.
This matters because learning is not simply what is visible at the end of a session. It is about what remains available to the learner when support is withdrawn.
Claims about benefit for disadvantaged pupils require even greater care. While there is some evidence that learners who are struggling can show substantial short-term gains with well-designed apps, the methods used to identify “underachievement” are often crude, and effects can be inflated by statistical artefacts rather than genuine change. More importantly, there is little evidence that such tools reliably reshape a learner’s long-term relationship with learning itself.
The evidence, taken as a whole, does not condemn educational technology. But neither does it justify the confidence often projected onto it.
What it suggests instead is something more modest and more demanding: that digital tutoring tools can support learning when they are narrowly targeted, carefully designed, thoughtfully implemented, and held within a broader educational ecology that does not confuse improvement with understanding.
It reminds us that learning technologies do not fail because they do nothing. They fail because they do something very well, and we mistake that something for everything.
Who is being consulted?
The research on this exists. It is decades deep. It is substantial.
The AI in Education research community has documented what these tools can and cannot do, under what conditions they help, and where the risks lie. That knowledge is readily accessible. It should inform policy.
The government has set a six-month deadline for co-creation with teachers. The question is whether it will also consult the researchers who have spent careers understanding how learning works, what interpretation requires, and what happens when systems designed for efficiency encounter the complexity of actual children.
When systems are introduced at scale for 450,000 children, the evidence base should be proportionate. Short-term gains on narrow measures are not the same as long-term support for learning. Promising efficiency is not the same as demonstrating benefit. And when interventions are tested primarily on disadvantaged children, someone should be asking whether that is equity or expedience.
The tools will be built. The pilots will run. What remains uncertain is whether the government will draw on the expertise that exists to shape what gets built, or proceed as if decades of research do not apply.
The traces left will remain long after the software is created, the pilot ends, and the Education Secretary moves on.
And those traces leave permanence.
References
Department for Education (2026). 450,000 disadvantaged pupils could benefit from AI tutoring tools. GOV.UK.
Porayska-Pomsta, K. (2024). From algorithm worship to the art of human learning.
Porayska-Pomsta, K. A Manifesto for a Pro-Actively Responsible AI in Education. Int J Artif Intell Educ 34, 73–83 (2024). https://doi.org/10.1007/s40593-023-00346-1
Outhwaite, L. A., Early, E., Herodotou, C., & Van Herwegen, J. (2022). Can maths apps add value to young children’s learning? A systematic review and content analysis. London: Nuffield Foundation.

