Because large language models work by predicting the next word in a sentence, they are more likely to use common words such as "it" or "is." The type of text that automated detector systems are good at picking up was found in research by Ippolito and a team of researchers.
The human participants in Ippolito's study tended to think that the text was written by a person.
In reality, human-written text is rife with typos and is incredibly variable, incorporating different styles and dialects. Ippolito claims that they are better at generating perfect texts.
She says that a mistake in the text is a sign that it was written by a human.
Language models can be used to detect text generated by artificial intelligence. Muhammad Abdul-Mageed is the Canada research chair in natural-language processing and machine learning and he says one of the most successful ways to do this is to retrain the model on some texts written by humans.
A computer scientist at the University of Texas on secondment as a researcher at OpenAI has been developing watermarks for longer pieces of text generated by models such as GPT-3.
The company is working on watermarks and its policies state that users should clearly indicate text generated by artificial intelligence.
Technical fixes come with a lot of conditions. Most of them don't stand a chance against newer models built on GPT-2 or other earlier models When there is a lot of text available, many of the detection tools work better, but they will be less efficient in some use cases, like email assistants, which rely on shorter conversations and less data to analyze. Powerful computers and access to the artificial intelligence model itself are required for using large language models for detection.