How Do We Know What People Know?
The assessment crisis no one wants to name — and why every institution is facing it at the same time
Last week, the New York Times reported that the digital SAT — the version specifically redesigned to be tamper-proof — is already vulnerable to new forms of cheating. Sites in China are selling test questions. Online forums offer software that bypasses the Bluebook app’s lockdown protections. Some services will remotely control a student’s device during the exam while the student sits there and pretends to take the test.
The College Board saw this coming. When they moved to a digital format in 2024, security was the whole point. Randomized questions. Adaptive difficulty. Locked-down software. No more recycled test booklets floating around the world.
Three years later, the cheating adapted. It always does.
What happened with the SAT is a small, clean example of something much bigger. Right now, every institution that evaluates human beings — K–12 schools, universities, admissions offices, employers, corporate training departments — is facing the same crisis at the same time. Most of them think they’re dealing with a local problem. The teacher worries about homework. The professor worries about essays. The recruiter worries about résumés. The compliance officer worries about training completion rates. They don’t often talk to each other. They should, because they’re all confronting the same question:
How do we know what people know?
The Link That Broke
For most of the twentieth century, assessment worked on a simple assumption: completing the task required doing the thinking. If a student submitted an essay, they probably wrote it. If a job applicant submitted a polished cover letter, they probably had the writing skills it demonstrated. The act of production and the act of understanding were bound together.
That link has been severed. Not by AI alone — people have always cheated; the SAT had impersonation rings back in 2011, and students have been paying others to write their papers for centuries. What AI has done is make the workaround trivially easy, universally accessible, and nearly undetectable. And that changes the math for everyone.
College Board research from 2025 found that 84 percent of high school students now use generative AI for schoolwork. A student quoted in EdSource put it plainly: “AI is built to answer prompts. So is homework. Of course students are cheating.” That student isn’t wrong. The assignments weren’t designed to withstand a tool that could complete them without understanding. They were designed for a world where the labor and the learning were the same thing.
In higher education, a UK university study published in PLOS ONE injected entirely AI-written submissions into the exam system across five psychology modules. Ninety-four percent went undetected. The AI submissions scored, on average, half a grade boundary higher than real students. A high school English teacher profiled by NPR described what this looks like in practice: after letting students use AI to write thesis statements, she realized they couldn't engage with the material at all. "They didn't know the material because they had outsourced that level of thinking," she said, "and they didn't have to come to a conclusion or an argument about the text they were studying on their own."
In college admissions, Princeton and Amherst now require “anchored writing samples” — graded papers from high school — as a baseline for authentic writing, essentially saying: we no longer trust the essay you submit on your own. In hiring, an estimated two-thirds of job applicants now use AI to write résumés and prepare for interviews. SHRM declared in 2025 that “recruitment is broken,” noting that average cost-per-hire and time-to-hire have both increased during the very period AI adoption surged.
And in corporate training — arguably the quietest and most dangerous version of this crisis — employees are using AI to breeze through compliance programs. On paper, completion rates look healthy. The learning isn’t sticking. This matters because compliance training exists to prevent actual harm: safety violations, regulatory failures, lawsuits. If no one is actually learning the material, the organization is exposed, and it may not discover this until something goes wrong.
Friends regularly reach out to tell me their boss just assigned a training — and that they're using AI to complete the entire thing. This is in accounting, finance, law, sales, and beyond. One friend was asked to complete an ethics training and was tempted to use AI to blow through it. "Is that ironic?" they asked. "Am I a bad person?" I had a hard time telling them not to. I remember completing those trainings myself. I would have been right behind them.
The Premium Is Friction
Carlo Rotella is an English professor at Boston College and the author of What Can I Get Out of This? — a book about what actually happens when students are asked to do hard intellectual work together in a room, with no devices, no shortcuts, and no way to hide. This century, for the first time, he’s giving in-class exams. Blue books are back.
Rotella doesn’t ban AI out of fear. He explains to his students that they’re paying roughly five dollars a minute for classes at Boston College, and spending that time practicing to be replaceable by a machine is a waste of their money and his time. “The entire point of this class is the labor,” he says. “It’s like joining the track team and doing your laps on an electric scooter.”
As Rotella puts it: “The real premium isn’t information anymore — it’s friction.” For years, education technology promised to remove friction from learning. Make it faster, smoother, more efficient. And now we’re discovering that friction was the thing doing the work all along. The resistance. The struggle. The part where you had to actually think. With content now cheap and ubiquitous, the physical classroom — the place where people look each other in the eye, pull their weight, and make meaning together — has become a haven for exactly the things technology cannot provide.
This reframes the entire crisis. The problem isn’t that people are cheating. The problem is that we spent decades optimizing assessment for efficiency and convenience — take-home essays, online quizzes, asynchronous submissions, auto-graded modules — and in doing so, we systematically removed the friction that made those assessments meaningful. AI didn’t break assessment. It revealed that assessment was already hollowed out.
Why Detection Doesn’t Work
The instinctive response has been to catch the cheaters. Flag the AI-written submissions. Build better locks. Andrej Karpathy, formerly of OpenAI, put it bluntly: “You will never be able to detect the use of AI in homework. Full stop.” In my Writing class at Fairleigh Dickinson University, I occasionally use tools that flag whether text was typed or pasted into a document. My students know this — and they still paste in AI-generated work. That leads to some very interesting conversations.
But it also proves the point at a small scale. The College Board invested heavily in digital security and within three years, new workarounds emerged. You can keep building higher walls. People keep finding ladders. The deeper issue is that detection accepts the premise that unsupervised, text-based, asynchronous assessment still works as long as you can verify the author. It doesn’t. When anyone can produce a polished essay or a completed training module without understanding the material, the artifact itself tells you nothing about the person who submitted it.
What’s Already Working
The institutions that are getting this right have stopped asking “how do we catch AI use?” and started asking “how do we make the human’s thinking visible?”
At CalTech, students who submit research projects with their applications now appear on video and are interviewed by an AI-powered voice about their work — like a dissertation defense. The recordings are reviewed by faculty and admissions officers. Ashley Pallie, CalTech’s dean of undergraduate admissions, describes it: “It might seem strange to use AI to get more of a human voice, but I think of it as a way to bring more authenticity into the fold.” The question the tool helps answer: “Can you claim this research intellectually?”
Notice what CalTech is not doing. They’re not scanning essays for AI fingerprints. They’ve designed an interaction where the student has to demonstrate their thinking in real time. The question has shifted from “did you write this?” to “can you defend it?”
Grading my own student’s interactions with AI — from simulations, role-playing games, to direct interactions with foundational models — has produced some fascinating snapshots of student thinking that, frankly, surpass my experience grading traditional essays.
And SHRM’s advice to employers arrives at the same conclusion: “Validate a candidate’s knowledge, skills, and abilities through early, live conversations.” Same answer teachers are giving. Go live. Go interactive. Go observable.
The pattern across all of these is clear: shift from evaluating artifacts to observing cognition. From “show me what you produced” to “show me how you think.” It’s the same move, made independently, by institutions that have never compared notes.
The Skills We’re Not Measuring
Redesigning assessment isn’t just about plugging security holes. The skills we need are changing, and our assessment systems were never built to measure the new ones.
Jobs for the Future found growing prioritization of human skills — critical thinking, initiative, leadership, communication — alongside and because of AI growth. Specialized digital skills are churning faster than ever; AI is shortening their useful life. The durable skills are the human ones. HR Dive put it directly: hiring someone today without assessing their AI skills is like hiring someone in the 1990s without checking if they could use the internet. They identified three dimensions that matter: communicating effectively with AI tools, critically evaluating AI outputs, and knowing when to use AI and when human judgment is required.
That last skill is the hardest to measure. A CHI 2025 study found that knowledge workers using GenAI reduced their critical engagement, especially on routine tasks — they just accepted the output. The people who maintained critical thinking were the ones with enough cross-domain knowledge to recognize when something was wrong.
The Question We All Have to Answer
The SAT story is a microcosm. Build better technology. Watch the workaround emerge. Repeat. The lesson is not “build better locks.” The lesson is: stop designing rooms that need them.
Every institution that evaluates human beings is going to have to answer the same question. Not “how do we catch AI use?” Not “how do we go back to the way things were?” The question is: how do we design evaluation where the human has to show up?
In a classroom, it might look like Rotella’s device-free discussions where every student pulls their weight. My students will write their first three essay drafts by hand. But when we move into the second and third essay — I will incorporate AI and use comparative transcript analysis to demonstrate, study, and align on the type of behavior I want to see. We’ll also use simulations and role-playing games to navigate specific moments in the drafting process.
In admissions, it might look like CalTech’s interactive defense — where an applicant doesn’t just submit work, but demonstrates that it’s theirs. In hiring, it might look like live, observable problem-solving instead of polished documents. In corporate training, it might mean replacing checkbox completion with scenario-based simulations that require real engagement.
Whatever replaces the essay, the homework assignment, the résumé, and the compliance quiz has to share certain qualities: it has to be live or near-live. It has to be interactive. It has to make the human’s thinking visible — not just their output. And it has to create conditions where the person has a reason to care about the process, not just the product.
What all of these share is a shift from detection to demonstration. From “prove you didn’t cheat” to “show me you can think.” From artifacts to interactions.
The institutions that make this shift first won’t just solve a cheating problem. They’ll have something far more valuable: a clear picture of what the people in front of them actually know. In a world where AI can produce any artifact on demand, that clarity is going to be worth everything.



Great piece as always, Mike. It’s uncanny how synchronous our thinking continues to be. I’ve been curious to hear what you’d think about my last post, "Solving the AI Conundrum with Epistemic Weaving," (https://wesstrabelsi.substack.com/p/solving-the-ai-conundrum-with-epistemic) which specifically attempts to address what you’re talking about. I’ve identified the same core issues: prioritizing process over product, abandoning artifact importance, and promoting direct conversation between teachers and students to make thinking visible. That is literally the goal of the tool I’m imagining.
With that, I have a couple of observations/questions about your article. First, the Rotella part: students at Boston College taking classes with an award-winning professor who has a Wikipedia page have a very strong incentive to trust their professor’s advice and get on with the plan. This incentive, unfortunately, is almost completely absent from most K-12 classes. 10th grade Timmy who has all but given up hope of passing a high-stakes exam, is far less likely to believe my proverbial Mr. Johnson who, in spite of all the right feelings toward Timmy, operates in a context that actually undermines his best efforts at engaging students. What can Mr. Johnson do?
Second, I want to "double-click" on something you said about EdTech’s aim to remove friction. As an EdTech specialist myself, I’d like to counter that this is definitely not the goal—at least not for useful EdTech. On the contrary, good tools attempt to provide the intrinsic motivation so that students will willingly engage with the friction you’re talking about. Once again, it all comes down to the stakes: what’s in it for the students? If you can’t promise Timmy a high-paying, stable job like most of Rotella’s students will get, what else can we promise him?
I don't know either.
Meanwhile, in Italy, pupils aged 13-19 have always been graded and tested in this manner: viva voce, live, synchronous, in person defence and explanation of their homework. Ask any Italian about “l’interrogazione”…