Universal development of artificial intelligence, especially launch Chatgpt Through OpenAI with extremely accurate and logical answers and dialogues, he moved public awareness and raised a new wave of interest in large models (LLM). It definitely became clear that their capabilities are greater than we've ever imagined. The headers reflected both excitement and care: can robots write a cover letter? Can they help students in tests? Will bots affect voters through social media? Are they able to create new projects instead of artists? Will they reject writers without a job?
After the spectacular release of CHATGPT, talks about similar models on Google, Meta and other companies are now held. IT specialists call for greater analysis. They believe that society needs a new level of infrastructure and tools to protect these models, and has focused on the development of such infrastructure.
One of these key security can be a tool that can provide teachers, journalists and citizens to distinguish between texts generated by LLM and texts written by people.
To this end, Eric Anthony Mitchell, a graduate of fourth -year computer science at Stanford University, working on a doctorate with his colleagues Detectgpt. It was issued as a demo and document that distinguishes the text generated by LLM from a text written by man. In the initial experiments, the tool accurately determines the authorship in 95% of cases in five popular LLM Open Source. The tool is at an early stage of development, but Mitchell and its colleagues are working to bring great benefits to society in the future.
Earlier, some general approaches to solving the identification problem of the authorship were studied. One approach, used by Opennai itself, involves training a model with two types of texts: some texts generated by LLM and others created by people. The model is then asked to identify the text. However, according to Mitchell, in order for this solution to be successful in various thematic areas and in different languages, this method would require a huge amount of training data.
The second approach avoids training a new model and simply uses LLM to discover your own output after transferring the text to the model.
Basically, the technique consists in asking LLM, how “he likes” a text sample, says Mitchell. And by “how” it does not mean that it is a feeling model that has its own preferences. If the model “likes” the text, this can be considered a high rating from the model of this text. Mitchell suggests that if the model likes the text, it is likely that the text has been generated by IT or similar models. If you don't like this text, it was probably not created by LLM. According to Mitchell, this approach works much better than random guessing.
Mitchell suggested that even the most powerful LLM have some prejudices against using one phrase idea over another. The model will be less likely to “like” all small paraphraces of its own performance than the original. At the same time, if you distort the text written by man, the probability that the model will like less or less than the original is more or less the same.
Mitchell also realized that this theory can be tested with the popular Open Source models, including those available through the API OPENAI. After all, calculating how much the model likes a specific text is essentially the key to teaching the model. This can be very useful.
To test their hypothesis, Mitchell and his colleagues conducted experiments in which they watched how various publicly available LLM liked the text created by man, as well as their own text generated by LLM. The selection of texts included false press articles, creative writing and academic essays. Scientists also measured how much LLM liked, on average 100 distortions of each LLM and a man written by man. After all the measurements, the team presented the difference between these two numbers: in the case of LLM texts and texts written by people. They saw two bell curves that barely overlap. Scientists came to the conclusion that it is possible to distinguish the source of texts very well using this single value. In this way, you can get a much more reliable result compared to methods that simply determine how much the model likes the original text.
In the initial experiments of the DetectGPT team, he successfully identified man-written text and text generated by LLM 95% time when using GPT3-NeoX, a powerful variant of the OPENAI GPT models. DetectGPT was also able to detect the text created by man and the text generated by LLM using LLM other than the original source model, but with slightly lower accuracy. At the time of initial experiments, ChatgPT was not yet available for direct tests.
Other companies and teams are also looking for ways to identify the text written by artificial intelligence. For example, Opeli has already released a new text classifier. However, Mitchell does not want to directly compare OPENAI results with DetectGPT results, because there is no normalized set of data for evaluation. But his team conducted experiments from the previous generation of the pre -trained AI Openai detector and stated that he acted well with articles in English, made medical articles badly and completely disappointed with information articles in German. According to Mitchell, such mixed results are typical for models depending on initial training. However, DetectGPT was satisfactory for all three of these text categories.
Feedback from DetectGPT users has already helped to identify some loans in security. For example, a person can turn to ChatgPT to avoid detecting, for example, specially asking LLM to write text like a human. Mitchella's syndrome already has several ideas on how to relieve this defect, but they have not yet been tested.
Another problem is that students using LLM, such as chatgPT, simply edit the text generated by AI to avoid detection to cheat tasks. Mitchell and his team examined this possibility in their work and found that although the quality of email detection has decreased, the system still does a pretty good job, identifying the text generated by the machine when less than 10-15% of words changed.
In the long term, the goal of DetectGpt is to provide society with a reliable and efficient tool for predicting, or text, and even part of it, a machine was generated. Even if the model does not think that the whole essay or information article has been written machine, there is a need for a tool that can be distinguished by a paragraph or sentence that looks particularly generated by the machine.
It is worth emphasizing that according to Mitchell, there are many justified applications of LLM in education, journalism and other areas. However, providing the public tools for verifying the source of information has always been beneficial and remains even in the AI era.
Detectgpt is just one of several works that Mitchell creates for LLM. Last year, he also published a few approaches to the LLM editing, as well as a strategy called “Self -destructive models”, which turns off LLM when someone tries to use it for vile purposes.
Mitchell hopes to improve each of these strategies at least again before the doctor is completed.
The study was published on ARXIV Pre -Sprints server.