WSJ Exclusive: OpenAI’s Alleged AI Detection Capabilities
The company has reportedly pulled the reins on an effective watermarking system for over a year
The Wall Street Journal yesterday reported that OpenAI has developed an AI detection tool that would help educators determine whether or not students used ChatGPT to write all or a large portion of a written assignment. The article claims they have had the new anti-cheating system ready for a full year but have refused to release for a number of reasons.
The system reportedly changes the way ChatGPT selects tokens to ensure that its outputs include an invisible watermark. The article claims that the watermarks are 99.9% effective “when enough new text is generated by ChatGPT.”
Notwithstanding the ambiguity of that last statement, the implications of this report are wide-ranging.
The internal tensions reported in the article, if true, only underscore the year-long drama that has followed OpenAI as a company. Not only that, the quotes and references to internal memos in the article point to internal leaks from disgruntled employees.
It is difficult to discern the most prominent reason that Sam Altman and Mira Murati have decided not to release the tool, since there are at least four major concerns at the highest levels, depending on how you read the article. However, reading between the lines indicates to me that the largest concerns are commercial in nature. A third of their most loyal users said in an April survey they would decrease their use of ChatGPT if the anti-cheating technology were deployed. The article states that the results of that survey “loomed large” at a recent early June meeting on the subject.
Less commercially-driven concerns included the worry that non-native English speakers would get dinged by the detection system more often than not. I find this confusing. I know that existing detection systems discriminate against non-native speakers through their use of statistical measures of word probabilities, but they are built from the outside looking in and don’t use a watermarking tool. Wouldn’t the watermarking tool – deployed within the very system that is creating the language – remove this inefficacy? I am not asking rhetorically. If you know the answer, please feel free to comment.
There are also concerns about bad actors that could reverse engineer the watermarking system if it were widely deployed. Again, this sounds dubious to me. Other than the intellectual property factor, why is it a concern if a programmer reverse engineers the watermarking system? In what malicious way could a hacker use a watermarking technology? Once again, I am not asking rhetorically. If you know the answer, please chime in.
The anti-cheating system does not degrade ChatGPT’s outputs, according to internal tests, which was a major initial (and valid) concern.
This final conclusion sounds like it may have been the straw that broke the camel’s back, in terms of sparking some employees to leak the information. The last two quotes of the article reflect the rising internal temperature at the company and the frustration level of some staffers interested in the tool’s release.
“Our ability to defend our lack of text watermarking is weak now that we know it doesn’t degrade outputs,” employees involved in the testing concluded, according to the internal documents.
“Without [releasing this system,] we risk credibility as responsible actors,” a summary of the June meeting said.
From these quotes, it is hard not to conclude that the employees in favor of releasing it gave their superiors a chance to do what they thought was right in June. When Altman and Co. balked, they went to the press.
Having been in these reporters’ shoes, I can tell you that it takes a lot for an employee to “turn” on their own company, no matter how convinced they are of their own stance. Usually, people simply leave, at which point they lose both access and credibility. It is rare to stay and blow the whistle.
Coming on the back of the resignations of leading members of the safety and super alignment team and – who could forget - the boardroom drama of last Fall, this is yet another bad look.
Perhaps this article will put pressure on OpenAI executives to release the tools. Perhaps not. It should at least force them to answer questions. On the whole, that is a good thing.
Put it this way; If OpenAI really does have the capability to detect ChatGPT-produced language at a 99.9% efficacy rate, they owe it to every teacher, student, parent, and administrator in the country to explain why they haven’t released it yet.
(Keep an eye out for a subtle but coordinated public relations response. If and when it comes, not only the language but also the method and mode of delivery will be telling.)
Short-Term Questions and Impacts
The existence of such an efficacious tool is widely relevant to the education industry. Even though it is only one tool and one AI platform, it is hard not to imagine other AI companies following suit, were OpenAI to release the tool and experience success. In fact, the article cites that Google is beta testing a similar watermarking tool called SynthID.
Furthermore, the surprise release of ChatGPT in November 2022 has wreaked absolute havoc on the education industry. Teachers have had little opportunity to digest and respond to the impact of the tool’s release. In that vein, it seems likely the company will be pressured — either by internal or external forces — to release the tool at some point to determine its efficacy.
Hypothetically speaking, let’s assume this happens by January. Here is a (very) quick rundown of questions and implications for teachers, students, and administrators:
The question surrounding who gets access is a relevant one. Is this a free tool? For who? District administrators and .edu accounts? What about high school teachers? Private and charter schools?
What if they decide to charge a fee? That would be something. In my view, this would be like an antivirus company releasing a virus into the market that only their software could catch and eliminate, and then charging everyone to buy their software.
Technically speaking, even the release of a perfect ChatGPT-detection software would not immediately solve the problem. My students could simply migrate to Claude, Gemini, Llama, Mistral, or another platform. However, one could argue that a watermarking breakthrough could create strong political impetus for the creation of industry standards that each AI platform was required to implement, perhaps even formal regulations.
Longer-Term
On a philosophical level, the release of the anti-cheating watermarking system could create a bizarre cat-and-mouse system where people are using the tool and being checked constantly. On the other hand, the proliferation of AI-generated text is a problem not just for the schooling system, but beyond as well. Professional employees are using it to produce large swathes of text that they never check or understand. If there were a watermarking system, it would certainly force a higher level of responsibility across the board. Ultimately, that may water down expectations for widespread societal transformation at the hands of Large Language Models. On a commercial level, one can see why OpenAI executives may be worried about releasing this tool and cannibalizing their own business.
Were it to be released, would this remove the need to teach students how to use LLMs “the right way”? No, but it would slow things down. It would give educators a much needed, albeit brief, respite from the onslaught that AI has brought to their industry.
Lastly, if you read the article itself, you may have noticed that I was quoted in it. Here is my quote is in case you do not have a subscription.
To provide a quick point of clarity, I did not know the subject of the article when I spoke with the reporter. I thought the article was about assessments (which you could say it was), but I knew nothing about the detection tool. For obvious reasons, the reporter would not have been able to share the details of the story with me ahead of time.
This is not meant as a defense, but as a segue way to my final point.
If I had known the topic of the story, I would have stopped the conversation and asked the reporter a question myself, which I will now turn to you: What if I told you that we already have a system that can determine whether or not a student wrote the essay they handed in? What if I told you we have had it for almost fifty years and have used it to great success? What if I told you it has already been vetted, tested, run through vast experiments, and developed by some of the brightest linguistic minds in academia? Would you believe me?
Furthermore, would you still be talking about using AI detection software? Would this even be a story? Wouldn’t we already be employing a tool to help us determine when a student received too much writing help, whether it was human or robotic?
I would think we would.