OpenAI introduces New AI classifier for indicating AI-written text

Clearly aware of educators freaking-the-f-out over ChatGPT, OpenAi has introduced a
A new human vs machine classifier has been introduced by OpenAI. They are making it available to get feedback on it from the public.

To discriminate between writing produced by a human and text authored by AI from different providers, a new classifier has been built. Although the classifier is not entirely trustworthy, it is thought to be able to reduce unfounded claims that material produced by AI was actually written by a person. With a 9% false positive rate, the classifier correctly labels 26% of AI-written text as “possibly AI-written.” The classifier is being made available to the general public for comments and further development. It should be used in conjunction with other techniques for identifying the source of material because it has limitations, such as being unreliable for short texts and texts written in languages other than English. On a dataset containing text generated by humans and artificial intelligence, the classifier is a language model that has been refined. It is being discussed how this classifier will affect education, and input from educators and other stakeholders who will be directly impacted is being sought.

Note the preceeding paragraph was produced by:

Taking the release notes and asking ChatGPT to summarize them.
Dropping that text into QuillBot article spinner.
Running it through OpenAI detector.

Results:

So, I think those using a system like that are well in the clear with some minor tweaks.

We’ve trained a classifier to distinguish between text written by a human and text written by AIs from a variety of providers. While it is impossible to reliably detect all AI-written text, we believe good classifiers can inform mitigations for false claims that AI-generated text was written by a human

Limitations

Our classifier has a number of important limitations. It should not be used as a primary decision-making tool, but instead as a complement to other methods of determining the source of a piece of text.

The classifier is very unreliable on short texts (below 1,000 characters). Even longer texts are sometimes incorrectly labeled by the classifier.
Sometimes human-written text will be incorrectly but confidently labeled as AI-written by our classifier.
We recommend using the classifier only for English text. It performs significantly worse in other languages and it is unreliable on code.
Text that is very predictable cannot be reliably identified. For example, it is impossible to predict whether a list of the first 1,000 prime numbers was written by AI or humans, because the correct answer is always the same.
AI-written text can be edited to evade the classifier. Classifiers like ours can be updated and retrained based on successful attacks, but it is unclear whether detection has an advantage in the long-term.
Classifiers based on neural networks are known to be poorly calibrated outside of their training data. For inputs that are very different from text in our training set, the classifier is sometimes extremely confident in a wrong prediction.

Training the classifier

Our classifier is a language model fine-tuned on a dataset of pairs of human-written text and AI-written text on the same topic. We collected this dataset from a variety of sources that we believe to be written by humans, such as the pretraining data and human demonstrations on prompts submitted to InstructGPT. We divided each text into a prompt and a response. On these prompts we generated responses from a variety of different language models trained by us and other organizations. For our web app, we adjust the confidence threshold to keep the false positive rate very low; in other words, we only mark text as likely AI-written if the classifier is very confident.

https://openai.com/blog/new-ai-classifier-for-indicating-ai-written-text/