10 Comments

I too took the test and I was correct, and I have to say I thought it was kind of obvious, at least for me. I have theory as to why. I am a professor who has been teaching writing for a long long time. You know how most people think that all faculty in university English departments study literature? Not me. My PhD is in composition and rhetoric and I teach everything from freshman composition to the graduate courses in how to teach freshman composition and lots of different not fiction/poetry writing classes for decades. I read about 1000 pages of student writing every semester, so you start to get a sense. Sample A had a lot more detail, a better sense of voice, etc. Plus I have been studying/reading a lot of AI, oh, and plus, I have to wonder if so many more picked A because it was first. It would have been interesting to see what would happened if you were able to randomly assign which story they read first.

I know that a lot of people are really impressed with AI's abilities to write, and I think there are ways it can be useful-- brainstorming, revision ideas, getting feedback, proofreading, etc. I've been teaching about these things lately with mixed results-- that's another post. But most of the people I know who are actually honest to goodness professional writers just aren't that blown away by AI writing. That's probably the difference between some of the readers who took the challenge and picked A.

Expand full comment

Your point is valid, Steven. For those who spend a great deal of time reading and analyzing writing -- not to mention doing it themselves -- the differences are quite noticeable. I have a hard time continuing to read posts when I get "that AI feeling," personally.

I saw it said recently that LLMs are "just a magic trick," and I broadly agree. However, for the average reader/writer, magic tricks are kind of cool, and it can be very difficult to discern how the rabbit came out of the hat in the first place.

Perhaps, piggybacking on your thought, we move to a place where more people can recognize the magic trick -- if only by virtue of the fact that more people are watching magic shows and asking "where did that come from?"

The bleak version of this is everyone walking around believing magic is real, which I think is a good analogy to sum up the writerly fears around AI.

Expand full comment

Have you ever seen that show the magicians Penn and Teller do (or used to?) called "Fool Us?" Basically, Penn and Teller were judges where other magicians performed a trick to see if they could do something that Penn and Teller could not figure out. So yeah, the magic comparison is right, and I think that's why you're seeing magicians in other fields (fiction writing, painting, song writing, movie making, etc.) often either dismissing or shrugging at AI. It's "good enough" for people who don't engage in whatever art, but not for people who do.

By the way, I read something just today about how one of the commonly observed problems of AI writing is it isn't very good at specific/localized detail. When I think about it now, that's the thing that tipped me off that the real one was the one that referenced a very specific sign in Grant's Pass, OR.

Expand full comment

I’m with Terry here. I (assume I) clearly see the difference here. But I read your piece beforehand and so I already knew which was which. So now I’m very curious: what if I didn’t? Would I have had any doubts then?

Would be interesting to see other writers, with different writing styles, in comparison with, again, different models. Moreover, I would like to know how the participants of the test judged their own judgement skills beforehand ánd their relative level of experience with both literary and bot-written non-fiction.

That would make for such an interesting piece of research to take part in…

But thanks Mike for this nice ‘one shot’-starter! I hope to be able to take part if there is going to be any follow-up!

Expand full comment

Your point about judging one's own judgment skills is interesting. Hence The Sicilian reference. As I brainstorm, I can't personally think of a way to measure that. Do we even have the ability to articulate when/how much we are judging our own judging skills?

If so, then this is a major checkpoint on the road to deeper metacognition. This, I think, is the great value of LLMs. This "mirror" aspect forces us to examine ourselves, not just because it's reflecting back the Internet, but because it's also reflecting back our prompt, linguistic choices, and ability to reflect deeply on what we are doing and why.

Is it intuitive? No. But as a "funhouse mirror," it sure twists the mind.

Expand full comment

it is getting better. In another year it may be able to ape Stephen King. It will take much longer to approximate Barbara Kingsolver. The thing is, it's being trained on an enormous amount of Kent-like writing samples. Writing subreddits are full of them. But there's very little astonishingly well-written prose out there, so where is the volume of words going to come from to teach it to write with soul?

Expand full comment

I suppose if there were a large amount of astonishingly well-written prose out there, it would no longer be considered astonishing!

Expand full comment

Thank you, Mike. This is such a valuable post, and it took a great deal of careful work to collect and analyze the data. The end result is a compelling experimental finding hard to ignore. Over 2/3 of your sample picked the AI text as human in origin. Personally, I’ve come to an AI place where it’s hard for me to believe anyone could mistake a bona fide one-shot piece of bottery from a bona fide one-shot piece of real writing. The difference between A and B were night and day. Of course, once you read really excellent hybrid text with a human author fully in charge, well, it gets tough. Jason Gulya just posted it’s getting hard for him to keep track of origins because ultimately everything goes back to him, these are his thoughts, and we all rent words anyway. Very nice contribution, Mike!

Expand full comment

Thank you sir! We all rent words anyway is a great way to "wrap" this issue in its deepest truth.

Expand full comment

Catching up on old Substacks! Really interesting angle, Mike, and absolutely an interactive bit you should add to your PD sessions. Maybe half the group knows which is AI-created before rating and the other half doesn't? Anything to get folks out of their nascent AI positions and back to an empirical evaluation of AI in education imo.

I ran something similar with students earlier this month based on Elizabeth Loppato's experience below. Encouraging students to get a bit 'over their AI skis' before debriefing the false sense of confidence/assurance when using AI tools for academic work.

https://www.theverge.com/2024/12/5/24313222/chatgpt-pardon-biden-bush-esquire?utm_content=buffer1811e&utm_medium=social&utm_source=bufferapp.com&utm_campaign=buffer

Expand full comment