Ai is still more powerful, which makes it harder to judge how intelligent models are really

How is an AI model judged when human beings are starting better? That is the challenge facing researchers such as Russell Wald, executive director of the Stanford Institute for Artificial Intelligence centered in Human (HAI).

“From 2024, there are very few categories of tasks in which human capacity exceeds AI, and even in these areas, the performance gap between AI and humans is quickly reducing,” Wald said last week in a presentation organized at the AI Singapore conference of Fortune Brainstorm. “IA is exceeding human abilities and is becoming increasingly difficult for us to compare.”

The HAI releases the AI index every year, whose objective is to provide a comprehensive snapshot based on data where AI is today. In Fortune Brainstorm Ai Singapore, Wald shared some outstanding aspects of The 2025 edition of the AI index, such as the increase in the power of current models, the growing domain of the industry on the border of AI and how China is about to overcome the United States


The following transcription has been edited slightly by conciseness and clarity.

I am Russell Wald, the executive director of the Stanford Institute for Human Centered Artificial Intelligence, or what we call “Hai”.

We are the global interdisciplinary research institute worldwide Stanford to the avant -garde to shape AI for the public good. HAI was established in 2019 with the objective of advancing in the research, education, politics and practice of AI. And, through our role as a call and a rigorous study of AI, we have become the trusted partner in the Government of AI for decision makers in industry, government and civil society.

I am going to talk about what we produce in Hai, which is the AI index, an annual analysis of trends that tracks research, development, deployment and socioeconomic impact of AI in the academy, government and industry.

We see that the performance of the AI constantly improves year after year. We use Midjourney, a text generator in the image, asking for a hyperrealist image of Harry Potter. And from February 2022 to July 2024, we see a rapid increase in these generated images.

In 2022, the model produced cartoon and inaccurate representations of Harry Potter, but by 2024, it could create surprisingly realistic representations. We have gone from what a Picasso painting reflects a strange representation of Daniel Radcliffe, the actor who played Harry Potter in the movies.

Due to this constant growth of performance, they challenge us more and more when it comes to the comparative evaluation of these models. From 2024, there are very few categories of tasks in which human capacity exceeds AI, and even in these areas, the performance gap between AI and humans is quickly reducing. From the recognition of images to mathematics at competence level to science questions at the doctoral level, AI is exceeding human capacities and is increasingly difficult for us of reference.

From medical care to transport, AI is quickly moving from the laboratory to our daily lives. In 2023, the United States drug and food administration approved 223 medical devices enabled for AI, compared to only six in 2015.

On roads, autonomous cars are no longer experimental. For example, Waymo, which I take regularly while I live in San Francisco, is one of the largest American operators and provides more than 150,000 autonomous trips every week, while the affordable Robotaxi de Baidu has a fleet that now serves numerous cities in China.

The commercial use of AI increased significantly after stagnating from 2017 to 2023. McKinsey’s latest report It reveals that 78% of respondents surveyed say that their organizations have begun to use AI in at least one commercial function, marking a significant increase of 55% in 2023.

Driven by small and more capable small models, the cost of inference for a system that works at the level of (GPT 3.5) decreased more than 280 times between November 2022 and October 2024. Hardware costs have decreased by 30% per year, while energy efficiency has improved by 40% every year.

Open weight models are also closing the gap with closed models, reducing the performance (gap) from 8% to 1.7% at some reference points in a single year. Together, these trends are rapidly down the barriers for advanced AI.

However, even with inference and hardware costs lowering, training costs remain out of the reach of the academy and most small players. Almost 90% of the notable AI models in 2024 come from the industry, which is greater than 60% in 2023. And although the academy remains a main source of highly cited research, it has difficulties at this point to stay so advanced at the border level.

The model scale continues to grow rapidly. Computer of Double Contome every five months, datastos each eight and use of power ania. However, performance gaps are being reduced. The score difference between the models classified in the tenth and 10 fell from 11.9% to 5.4% in one year, and the two main models are now separated by only 0.7%. The border is increasingly competitive and increasingly full.

In recent years, the performance of the IA model on the border has converged, with multiple suppliers that now offer highly capable models. This marks a change since the end of 2022, when the launch of Chatgpt, widely seen as the advance of AI in public consciousness, coincided with the landscape dominated by only two players: OpenAi and Google.

One of the most important things to keep in mind is that the Transformer model costs $ 930 for Google to train in 2017, and that is the T in GPT, the architecture reference level, and now today we are at $ 200 million to train Gemini Ultra.

Last year’s index was one of the first publications in highlighting the lack of standard reference points for the safety and responsibility evaluations of AI. The index has also been analyzing global public opinion. If you are from a non -western industrialized nation, you are more likely to see the AI positively. China has a positive vision of 83%, Indonesia 80%and Thailand 77%. While Canada is 40%, 39%of the US and the Netherlands 36%.

I will close with the geopolitical situation. The United States still maintains an advantage in AI, closely followed by China. However, this gap is squeezing. My intention is not to exacerbate the idea of an arms race between China and the United States, but to highlight the different approaches between the most advanced border models developers.

In recent years, the United States has been based on some suppliers of patented models. Meanwhile, China has invested deeply in its talent base and, more importantly, in an open source environment. If this trend continues, and I appear next year, at this rhythm, China would overcome the US. In terms of model performance.

Source link

Leave a Reply

Your email address will not be published. Required fields are marked *