Useful information

Prime News delivers timely, accurate news and insights on global events, politics, business, and technology

Synthetic data has its limits: Why human-sourced data can help prevent AI model collapse


Join our daily and weekly newsletters to get the latest updates and exclusive content on industry-leading AI coverage. More information


Wow, how quickly the tables turn in the world of technology. Just two years ago, AI was hailed as the “next transformative technology to rule them all.” Now, instead of reaching Skynet levels and dominating the world, AI is, ironically, degrading.

AI, once the harbinger of a new era of intelligence, is now stumbling over its own code, struggling to live up to the brilliance it promised. But why exactly? The simple fact is that we are depriving AI of the only thing that makes it truly intelligent: human-generated data.

To feed these data-hungry models, researchers and organizations have increasingly turned to synthetic data. While this practice has long been a staple in AI development, we are now entering dangerous territory by relying too much on it, causing a gradual degradation of AI models. And this is not just a minor concern that ChatGPT produces poor results: the consequences are much more dangerous.

When AI models are trained on results generated by previous iterations, they tend to propagate errors and introduce noise, leading to a decrease in the quality of the results. This recursive process turns the well-known “garbage in, garbage out” cycle into a self-perpetuating problem, significantly reducing the effectiveness of the system. As AI moves away from human understanding and precision, it not only undermines performance but also raises critical concerns about the long-term viability of relying on self-generated data for continued AI development.

But this is not just a degradation of technology; It is a degradation of reality, identity and authenticity of data, which poses serious risks to humanity and society. The knock-on effects could be profound, leading to an increase in critical errors. As these models lose accuracy and reliability, the consequences could be dire: think medical misdiagnoses, financial losses, and even life-threatening accidents.

Another important implication is that AI development could stall completely, leaving AI systems unable to assimilate new data and essentially becoming “stuck in time.” This stagnation would not only hinder progress but also trap AI in a cycle of diminishing returns, with potentially catastrophic effects for technology and society.

But, in practice, what can companies do to ensure the safety of their customers and users? Before answering that question, we need to understand how this all works.

When a model collapses, reliability is lost

The more AI-generated content spreads online, the faster it will infiltrate data sets and, subsequently, the models themselves. And it’s happening at a rapid pace, making it increasingly difficult for developers to filter out anything but pure human-created training data. The fact is that the use of synthetic content in training can trigger a detrimental phenomenon known as “model collapse” or “model autophagy disorder (ANGRY).”

Model collapse is the degenerative process in which AI systems progressively lose control of the true underlying data distribution they are intended to model. This often occurs when AI is trained recursively on the content it generates, leading to a number of problems:

  • Loss of nuances: Models begin to forget outlier data or underrepresented information, crucial for a comprehensive understanding of any data set.
  • Reduced diversity: There is a notable decrease in the diversity and quality of the results produced by the models.
  • Bias amplification: Existing biases, particularly against marginalized groups, may be exacerbated if the model ignores nuanced data that could mitigate these biases.
  • Generation of meaningless results.: Over time, models can begin to produce results that are unrelated or make no sense.

A good example: a study published in Nature He highlighted the rapid degeneration of recursively trained language models on AI-generated text. In the ninth iteration, these models were found to be producing completely irrelevant and meaningless content, demonstrating the rapid decline in data quality and model usefulness.

Safeguarding the future of AI: steps businesses can take today

Enterprise organizations are in a unique position to shape the future of AI responsibly, and there are clear, practical steps they can take to keep AI systems accurate and reliable:

  • Invest in data provenance tools: Tools that track where each data comes from and how it changes over time give companies confidence in their AI contributions. With clear visibility into data sources, organizations can avoid feeding models unreliable or biased information.
  • Implement AI-powered filters to detect synthetic content: Advanced filters can detect AI-generated or low-quality content before it is included in training data sets. These filters help ensure that models learn from authentic, human-created information rather than synthetic data that lacks the complexity of the real world.
  • Partner with trusted data providers: Strong relationships with vetted data providers provide organizations with a steady supply of high-quality, authentic data. This means that AI models gain real, nuanced insights that reflect real-world scenarios, increasing both performance and relevance.
  • Promote digital literacy and awareness: By educating teams and customers about the importance of data authenticity, organizations can help people recognize AI-generated content and understand the risks of synthetic data. Raising awareness about responsible data use fosters a culture that values ​​accuracy and integrity in AI development.

The future of AI depends on responsible action. Companies have a real opportunity to keep AI grounded in accuracy and integrity. By choosing real, human-sourced data over shortcuts, prioritizing tools that detect and filter low-quality content, and fostering awareness around digital authenticity, organizations can put AI on a safer, smarter path. Let’s focus on building a future where AI is powerful and genuinely beneficial to society.

Rick Song is the CEO and co-founder of Person.

DataDecisionMakers

Welcome to the VentureBeat community!

DataDecisionMakers is the place where experts, including data technicians, can share data-related knowledge and innovation.

If you want to read about cutting-edge ideas and up-to-date information, best practices, and the future of data and data technology, join us at DataDecisionMakers.

You might even consider contributing an article of your own!

Read more from DataDecisionMakers



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *