AI Systems May Be on the Verge of Collapsing into Nonsense, Scientists Warn
"Model collapse" could make ChatGPT and other AI less useful within a couple of years.
Researchers warn that artificial intelligence systems could become nonsensical as the Internet becomes flooded with content created by these very methods.
The rise of AI in generating text, sound, and images has created a wave of opportunities. For many, it's an ideal tool for creating content quickly and on a large scale.
However, many companies producing these models use text extracted from the Internet to train them. This can lead to a loop where the AI systems used to produce that text are trained with it.
This could quickly cause these AI tools to descend into gibberish and nonsense, according to researchers in a new article published in the journal Nature. This aligns with the "dead Internet theory," which suggests that an increasing portion of the web is becoming automated in what could be a vicious cycle, making it limited and irrelevant.
According to the study, just a few cycles of content generation and learning from it are enough for the systems to produce absurd and repetitive results. In some cases, it is estimated that it takes only nine generations to reach repetitive and incoherent outcomes.
They warn that the problem "must be taken seriously if we want to maintain the benefits of training on large scale data extracted from the web."
If it became known that the quality of AI generated content would deteriorate over time, then people's trust in AI tools could wane. People might then become more skeptical about the type of information that they find on the Internet. Although, arguably this should already be the case, accessing information from trusted sources.
The issue could be addressed with a several possible solutions, such as watermarking the results so that automated systems can detect and filter them from training sets as AI generated content. Additionally, there is an undeniable need for content creators to continue feeding the datasets with which AI is trained. It is estimated that in just a couple of years, AIs will have used all of the text present on the Internet, including literary works.