OpenAI Skips O2 And Releases The New O3 'reasoning' Model

The final day of OpenAI’s “12 Days of Shipmas” came with the unveiling of o3, a new chain of thought “reasoning” model that the company says is its most advanced yet. The model is not yet available for general use, but security researchers can sign up for a preview starting today.

OpenAI and others hope that reasoning models will go a long way toward solving the pernicious problem of chatbots frequently producing incorrect answers. Basically, chatbots do not “think” like humans and different techniques are needed to try to create the best simulation of a human thought process.

When asked a question, reasoning models pause and consider related suggestions that could help produce an accurate answer. For example, if you ask the o3 model, “can habaneros be grown in the Pacific Northwest?”, the model could set a series of questions that it will investigate to reach a conclusion, such as “where do habaneros typically grow?” what are the ideal conditions for growing habaneros” and “what kind of climate does the Pacific Northwest have?” Anyone who has used chatbots knows that sometimes you need to ask a chatbot to do additional follow-ups until you finally get the right result. Reasoning models are supposed to do this extra work for you.

o3 is the successor to o1, OpenAI’s first chain-of-thought reasoning model. Representatives said they decided to skip the “o2” naming convention “out of respect” for the British telecommunications company, but it certainly doesn’t hurt that it makes the product sound more advanced. The company says the new model comes with the ability to adjust its reasoning time. Users can choose a low, medium or high reasoning time; the larger the calculation, the better o3 is supposed to work. OpenAI says it will spend time “teaming” the new model with researchers to prevent it from producing potentially harmful responses (since, again, it is not a human and does not distinguish between good and evil).

Reasoning is the buzzword of the day in the field of generative AI, as industry experts believe it is the next unlock needed to improve the performance of large language models. In the end, more computing does not offer equivalent performance gains, so new techniques are needed. Google DeepMind recently introduced its own reasoning model called Gemini Deep Research, which can take 5-10 minutes to generate a report that analyzes many sources on the web to reach its conclusions.

OpenAI relies on o3 and offers impressive benchmarks: it says that in a Codeforcing test, which measures coding ability, o3 scored 2727. For context, a score of 2400 would place an engineer in the 99th percentile of programmers . You score 96.7% on the 2024 American Invitational Mathematics Exam with only one question missing. We’ll have to see how the model performs in real-world tests, and in general, it’s still not a good idea to rely too much on AI models for important jobs where accuracy is necessary. But optimists are confident that the accuracy problem is being solved. Hopefully, because as things stand, Google’s AI overviews in search are still the subject of frequent mockery on social media.

AI modeling companies like OpenAI and Perplexity are in a race to become the next Google, collecting the world’s knowledge and helping users make sense of it all. They even now have search products aimed at more directly replicating Google with access to real-time web results.

However, all of these players seem to outdo each other with each passing day. The feeling is somewhat reminiscent of the late ’90s, when there were countless search engines to choose from: Google, Yahoo and AltaVista, Ask Jeeves, just to name a few, all sucking up data from the Internet and presenting it simply with a different user experience. Most of them disappeared after one appeared that was vastly better than the rest: Google.

OpenAI clearly has a strong lead right now with hundreds of millions of monthly active users and a partnership with Apple, but Google has received a lot of plaudits recently for advancements in its Gemini models. The Verge reports that the company will soon integrate Gemini deeper into your search interface.

Source link