(TL;DR: I gave my students AI-generated feedback alongside teacher-generated feedback and asked them to compare the two. Data and feedback from students via screengrabs are about 3/4 of the way through the article.)
“Wouldn’t it be nice?” - The Beach Boys
This Beach Boys classic popped into my head this morning as I read another email from an AI developer offering a tool that provides “quick, efficient, and meaningful AI-generated feedback for student writing.”
Yes, Brian Wilson, that would be nice.
“I can’t wait to grade” is the last phrase you will here out of any teacher’s mouth, should you choose to engage them on the subject. Author John Tierney explained the root of this phenomenon in this 2013 article aptly titled “Why Teachers Secretly Hate Grading.”
The only change I would have made to the piece would have been to change the title to: “Why Teachers Openly Hate Grading.” Even Math teachers, whose assessments lean toward multiple choice, grumble at the prospect of evaluating the “show-your-work” portion of each assessment.
Humanities teachers have it the worst though.
I teach Freshmen World Literature and have 112 students. Most essays are anywhere from 3-5 pages, double-spaced. At a minimum of two essays per quarter, that’s 896 essays I will read and grade this school year — somewhere between 2,688 - 4,480 pages of student writing. That’s not counting Midterm Exams, Final Exams, and the journal entries and short responses that students complete in between.
This heavy load is the reason so many EdTech platforms are clamoring to develop the coveted AI feedback tools that changes the grading game for teachers. It’ll be a cash cow, for one. Second, it should improve the classroom experience dramatically.
Intrigued by the promise of lightening the load and enriching student feedback, I decided to experiment with Brisk Teaching’s Free AI Tool back in November.
Initially, its speed and accuracy impressed me with feedback that, while helpful, gradually revealed itself to be too generic, often suggesting students use synonyms or proofread for grammatical errors—useful but not transformative advice. The free version also lacked customization for my grading conventions.
Determined to give it a fair shot, I upgraded to premium and ran a controlled experiment, comparing the AI's feedback with my own and seeking my students' opinions on both.
As I worked, I secretly hoped they liked the AI feedback better than mine. That would mean I could use Brisk’s tool with a clean conscience for the rest of the year, even if it might also lead to some existential angst about my abilities as a writing teacher. No matter, I thought. The juice felt worth the squeeze. I’d have data I could really use. So I dove in.
The Procedure
First, I recognized the AI required clear, specific prompts to produce relevant feedback, a process that, while tedious, held the promise of efficiency. For the uninitiated, AI tools need some level of “prompt engineering” to help narrow its scope and focus. You have to be very specific about what you want it to do. The quality of the output is dependent on the quality of the input.
I decided to write a “system prompt” that would direct the AI bot in the direction(s) I needed it to go using principles of prompt engineering learned from this course with Dr. Jules White on Coursera (highly recommend.)
Step One: Curate a “system prompt” that will direct the AI.
I curated the system prompt with the goal of using the same language I use within my essay writing conventions. Here was the system prompt I utilized:
1. Find the thesis statement. It should be the last 2-3 sentences of the Introduction Paragraph. Analyze it. The statement should make a clear argument with reasons that can be supported by evidence. It should also be clear and concise.
2. Find the evidence. There should be four direct quotes in the essay. Do the quotes match with the thesis statement? Sometimes you, Brisk, will need to look at the two evidence together to see if they combine to make a clear argument.
3. Read the last 3-4 sentences of each body paragraph. In these locations, the student should explain the relevance or significance of the evidence and then connect it back to the thesis statement, one way or another. Does the end of the paragraph connect the evidence back to the original argument?
4. Focus on the clarity of the argument. Does the thesis live throughout the whole paper? Is the thesis clearly argued? Does the author circle back to their original argument consistently?
Given that Brisk (at the time) was only accessible via Google Documents, I had to copy and paste each individual essay onto Google Docs before running the prompt. This is one of many examples where AI use for teachers becomes tedious. I think/know that most platforms are aware of this and working on it, but it may ultimately mean that many smaller platforms are gobbled up by the bigger sharks to create efficiencies for users over time - a la social media over the past fifteen years. No one wants to manually reconcile multiple platforms against one another to gain the efficiencies of artificial intelligence.
(Another incongruity/point of tedium is this: If I want to assess and monitor my student’s use of an AI tool in their own work — like ChatGPT or Character.ai — they have to copy/paste the entire chat into a separate document and send it to me. The incompatibility with Microsoft Word and Google Docs leads to these conversations becoming very difficult to read — unless students format the chat transcripts by hand in a user-friendly way. More on this to come.)
Step Two: Analyze the AI feedback
Below are two examples of feedback that the AI tool generated.
AI Feedback (Student 1):
☀️ Glow: Your thesis statement is clear and concise, outlining your main points of analysis: "Throughout my conversation with the Holden ai chatbot, I found that it was effective in furthering my understanding of the character Holden Caulfeild and the specifications of his life. The chatbot also did a good job of matching up with the Holden Caulfield that was presented in the novel." You've also done a great job using direct quotes from your conversation with the chatbot and the book to support your argument.
🌱 Grow: While you've used quotes effectively, it would be beneficial to delve a bit deeper into the analysis of these quotes. You've made the connection between the chatbot's quotes and the book's quotes, but you could expand on why these connections are significant and how they support your thesis.
Overall, your essay is well-structured and your argument is clear. With a bit more in-depth analysis and exploration of potential limitations of the chatbot, your argument could be even stronger.
AI Feedback (Student 2):
☀️ Glow: Your essay is well-organized and provides a thorough analysis of your interactions with the AI chatbot as Holden Caulfield. Your thesis statement, "During my experiment with the AI, I found that, the AI chatbot of Holden Caulfield is not accurate in knowing about the characters and their relationships and opinions in the novel. However, it furthered my understanding of Holden and was effective in explaining the main themes of the novel," clearly outlines your argument and the points you plan to address.
🌱 Grow: While your thesis statement is clear, it could be more concise. Consider revising it to something like, "Although the AI chatbot of Holden Caulfield inaccurately portrays characters and their relationships from the novel, it enhanced my understanding of Holden and effectively conveyed the novel's main themes."
In terms of evidence, you've done a good job including direct quotes from both the novel and your AI conversation. However, it's important to ensure that your evidence directly supports your thesis. For example, the quote, “I’m happy to say that I did! … This semester, I was on honor roll for the first time in years,” is a great piece of evidence, but it's not clear how it supports your argument that the AI chatbot effectively explained the main themes of the novel.
💭 Wondering: I wonder if you could better connect the evidence back to your thesis in the concluding sentences of your body paragraphs. For example, after presenting the evidence about Holden's relationship with Allie, you conclude that the AI does not match up with Salinger's version of Holden. But how does this support your thesis that the AI chatbot inaccurately portrays characters and their relationships?
Lastly, you've done a good job maintaining a clear argument throughout your essay, but I wonder if you could make your argument even stronger by consistently circling back to your original thesis. For example, when you discuss how the AI chatbot furthered your understanding of Holden, you could explicitly tie this back to your thesis by saying something like, "This supports my argument that the AI chatbot, despite its inaccuracies, enhanced my understanding of Holden."
The first thing I noticed was that the “better writers” in the class received much more feedback than my writers who struggled. Student 2 wrote two more pages than Student 1 on this essay and consistently earns the highest marks in the class. He received five bullet points worth of feedback to Student 1’s three.
Second, the tool only sometimes followed my directions. Student 1’s feedback covered two of my four directives, while Student 2 received feedback on all four. This, I think, was a product of the fact that Student 2 followed my writing conventions more closely, making it easier for Brisk to match my writing conventions against the student’s writing. Interestingly, this meant that Brisk was much better at giving meaningful feedback to “good” writers versus those that struggled to follow my rubric. This is actually quite similar to the teacher experience of giving feedback. It is easier to “talk” to a student that knows what they are doing than one that does not.
Last, the final bullet point for Student #1 felt like a throwaway: “Overall, your essay is well-structured and your argument is clear. With a bit more in-depth analysis and exploration of potential limitations of the chatbot, your argument could be even stronger.”
I tried to reserve judgment. After all, who cares what I think? If the student finds the feedback useful and their writing improves, then the feedback should be considered effective and I should continue using it. Maybe, for example, Student 1 is not able to digest five bullet points’ worth of feedback. Maybe the more concise nature of his feedback would be exactly what he needed.
Step Three: Pair it with Teacher Feedback
For this experiment, I gave slightly less feedback than I usually give on essays. Here is the feedback I provided for Students 1 and 2:
Student 1:
Mr. Kentz’s Notes: At the end of BP1, you say that the HoldenAI bot helped further your understanding of the real Holden a couple of times, but never really say how. What did you learn? How did you understand him further?
BP2: You did a good job of proving that the bot matched the book - but how did that affect your understanding? Was it more of the same? Did it change things, or no?
Student 2:
Mr. Kentz’s Notes: The AI bot feedback has a lot to say, so I'll keep this brief. Your writing is very thoughtful and thorough, you have great skill in this area. On a very small level, you have a tendency to write in fragments or with small word choice errors that confuse the reader. For example, the second sentence here: "This makes it clear that Holden has hatred towards phonies. As he connects this term with the society of the adult world."
Proofread your essay closely for fragments in the future.
As you can see, my feedback was much more personalized. For the first student, it drove to the heart of the essay prompt itself, rather than focusing on process. Some writing teachers would say this is an ineffective approach because it is less transferable, but in some cases when the student “misses the point” I believe it is necessary to focus on content rather than process.
For the second student, I focused on his penchant for using fragments - an area I would normally have skipped over in favor of “meatier” feedback. In this case, I felt comfortable commenting on for this essay since the AI had done such a thorough job.
Step Four: Gather student opinions and feedback
I generated a short survey to find out from my students what they thought of the AI feedback next to mine. I surveyed 51 students. The first two questions were “Did you find the AI feedback understandable?” and “Did you find the AI feedback useful?” Below is the breakdown:
As you can see, just under 70% found the AI feedback both understandable and useful. Great news!
Further, students on average rated its usefulness at a 7 out of 10.
However, sentiment flipped almost exactly when I asked them which mode of feedback they would prefer - AI Feedback or Teacher Feedback:
This is not the first time that teenaged students have left me scratching my head.
In essence, they like the feedback, find it very useful and understandable, and yet still want my feedback over that of the AI? What gives?
Below is a screenshot of how some of them explained themselves:
Student 1 wrote: “the ai feedback was a little simular to the teacher but the ai machine made evrything sound a little un natural just because it isnt natural.”
Student 2 wrote (included above): “I didn't find it to be very different than Mr. Kentz feedback. It seemed clear and organized in clearly displaying what I did well and what I need to improve on. However, I think Mr. Kentz's feedback is a little bit easier to understand in what ways I should improve. This seemed to be a very reliable AI feedback, but I still like the way Mr. Kentz displays feedback better in giving direct examples in ways that I should improve or that I did well.”
The most common response was “it was more of the same,” meaning the AI reiterated much of the same feedback I had been giving for the first several essays of the year. This should have been a good thing, I thought, until I realized that there were shades of distrust in the student comments.
They didn’t think the Brisk feedback was wrong per se, they just preferred the personal touch. Some wrote that the feedback veered into “generic” territory, even though it was still “better than nothing,” as one student said.
I found myself thinking: What would have happened if I gave them the AI feedback without mine to compare it against? Would they have noticed? Would they have incorporated it into their writing approach? Would it have helped?
These questions underscore the need for ongoing research and data capture on the efficacy of AI tools in education. These experiments are difficult to run with/without “controls.” We need several rounds of data via several different methodologies before we can draw wide-ranging conclusions (especially with a sample size this small.)
What do students really want in writing feedback?
As a throw-in, I thought I’d ask them for general opinions regarding writing feedback, What would make essay feedback better - whether coming from the teacher or an AI?
They want more examples…but they also want it shorter and more concise. If AI can figure out how to do that, we’ll be golden!
Grade the Process, Not the Output
In my opinion, the debate around AI-assisted writing is moving towards a framework where teachers grade the process and not the output. I am not the first to say this and I will not be the last. Friend and fellow AI-in-writing thought leader Nick Potkalitsky wrote about this recently on his blog, and I think he is right.
Importantly, EdTech companies are thinking about this as well. I was in attendance at the 2024 AI in Education Summit at Stanford in January when Sal Khan spoke about this exact subject during a Fireside Chat with Chris Piech.
(Scroll to 22:28 in the video clip for reference.)
“We’re doing some really fun things around writing where a teacher can create the assignment, create the rubric, and Khanmigo works with the student on it and then reports back to the teacher the entire process and it can even give insights on the process across students hey these kids are having trouble with thesis statements…these kids are having trouble outline…”
Developing AI writing tools that offer a class dashboard for real-time insight into students' struggles could dramatically transform the teaching of writing. Currently, monitoring student interactions with AI is cumbersome and limits project execution. However, the interest of EdTech platforms in facilitating this transparency is promising, as it has the potential to significantly enhance the writing instruction process.
Apologies for the long post, but I hope the juice was worth the squeeze!
Please reach out with thoughts, comments, and ideas.
This mostly jibes with my own experience. My students find the AI feedback useful, but don't want that over a teacher's feedback at all.
Fascinating read. Heartening to see educators figuring out how best to us AI in the classroom instead of ignoring this new technology. Wondering how Japanese and South Korean use AI since it seems there is a greater integration of tech into daily life there? Can we learn from them?