AI’s Insatiable Appetite: The Looming Data Drought and the Controversial Quest for Alternative Sources

3 min read

Activities

Divisions

Performances

Activities

Divisions

Performances

AI firms have utilized the whole internet to educate their systems and are now facing a shortage of data

In an effort to enhance each Large Language Model (LLM) beyond its predecessor, AI firms have nearly exhausted the freely available internet and are now facing a data shortage. They might need to resort to training their forthcoming models on AI-produced data, which brings its own set of issues.

AI businesses are confronting a massive problem that could make all the billions of dollars invested in them by major tech companies worthless: they are depleting their internet resources.

AI firms, in their quest to create increasingly complex and larger language models, have virtually exhausted the entirety of the open internet. Now, they are on the brink of a data shortage, as stated by the Wall Street Journal.

This problem is driving certain companies to explore different options for training data, like publicly accessible video transcripts and developing AI-created "synthetic data". Nonetheless, using AI-produced data to educate AI models presents its own set of difficulties – it increases the likelihood of AI models experiencing hallucinations.

Moreover, conversations about artificial data have brought up significant worries about the possible outcomes of training AI models with data produced by AI. Specialists propose that the excessive dependence on data generated by AI may lead to digital "inbreeding," which could ultimately cause the AI model to self-destruct.

Companies like Dataology, established by Ari Morcos who previously worked with Meta and Google DeepMind, are investigating techniques to educate comprehensive models using less data and resources. However, most key competitors are experimenting with somewhat unusual and controversial methods of data training.

According to information referenced by the Wall Street Journal, OpenAI is contemplating the use of transcriptions from publicly accessible YouTube videos to train its forthcoming GPT-5 model. However, the AI firm has previously faced backlash for using similar methods with their Sora model and could potentially be subject to legal action from video creators.

Despite the lack of clarity on their specific techniques, firms such as OpenAI and Anthropic are strategizing to tackle this issue by creating high-quality artificial data.

Concerns about artificial intelligence firms have been circulating for a while. Although some, such as Epoch researcher Pablo Villalobos, forecast that AI might deplete its viable training data in the next few years, there is a widespread belief that major advancements could alleviate these worries.

Nevertheless, there's another way to address this problem: AI firms might choose not to follow the path of creating bigger, more sophisticated models, taking into account the environmental impact of their creation, such as considerable energy usage and the dependence on scarce minerals for computing chips.

(Incorporating information from various sources)

Search for us on YouTube

Best Programs

Locate us on YouTube

Best Programs

are available on YouTube

Firstpost retains all rights, as per copyright laws, as of 202

You May Also Like

More From Author

+ There are no comments

Add yours