Useful information
Prime News delivers timely, accurate news and insights on global events, politics, business, and technology
Useful information
Prime News delivers timely, accurate news and insights on global events, politics, business, and technology
Do you want smarter ideas in your entrance tray? Register in our weekly newsletters to obtain only what matters to the leaders of AI, data and business security. Subscribe now
Opadai It joins a voice market of increasingly competitive for companies with their New model, GPT-Realtimethat follows complex and voices “that sound more natural and expressive”.
As the voice AI continues to grow, and customers find use cases such as customer service calls or translation in real time, the market of voices that sound realistic that also offer business degree security is warming up. Operai states that his new model provides a more human voice, but still needs to compete against companies such as Elevenlabs.
The model will be available in the real -time API, which the company also generally put. Together with the GPT-Realtime model, Openai also released new voices in the API, which calls Cedar and Marin, and updated his other voices to work with the last model.
Operai said in a live broadcast that worked with their customers who are building voice applications to train GPT-Realtime and “carefully aligned the model to Evals that are based on real world scenarios such as customer service and academic tutoring.”
Ai scaling reaches its limits
The power limits, the increase in token costs and inference delays are remodeling Enterprise AI. Join our exclusive room to discover how the best teams are:
Ensure your place to stay at the forefront: https://bit.ly/4mwgngo
The company promoted the capacity of the model to create emotional and natural sound voices that are also aligned with the way developers are built with technology.
The model works within a voice to voice frame, which allows you to understand the spoken indications and respond vocally. Voice -to -voice models are ideal for real -time responses, where a person, typically a client, interacts with an application.
For example, a customer wants to return some products and call a customer service platform. They could be talking to a voice assistant who answers questions and requests as if they were talking to a human.
In a live broadcast, Operai customers T-Mobile He showed an agent with a voice of AI that helps people find new phones. Another client, the real estate search platform ZillowHe showed an agent that helps someone reduce a neighborhood to find the perfect place.
Operai said that GPT-Realtime is his “most advanced voice model ready for production.” Like your other voice models, languages can change in the middle of prayer. However, Operai researchers noticed that GPT-Realtime can follow more complex instructions such as “speaking emphatically with French accent.”
But GPT-Realtime faces the competition of other models that many brands already use. Eleven Conversation launched AI 2.0 in May. Healthy It is associated with fast food franchises for a voice drive. Enemy Start Hume He has launched his Evi 3 model, which allows users to generate versions of their own voice.
As companies discover several use cases for voice AI, even more general models suppliers offer multimodal LLMs are presenting a case by themselves. Mistral He launched his new voxtral model, stating that he would work well with the translation in real time. Google He is improving his audio capabilities and gaining popularity with an audio function in Notebooklm that converts research notes into a podcast.
Operai said that GPT-Realtime is smarter and better understands the native audio, including the ability to catch nonverbal signals such as laughs or sighs.
The comparative evaluation using the Big Bench audio evaluation showed that the model obtained a score of 82.8% in precision, compared to its previous model, which obtained 65.6%. Openai did not provide numbers that tested GPT-realtime against models of their competitors.
Openai focused on improving the model’s instruction monitoring capabilities, ensuring that the model adheres to the addresses more effectively. The new model achieves a 30.5% score at the Multichallenge audio reference point. The engineers also reinforced the functions calling so that GPT-Realtime can access the correct tools.
To support the new model and improve how companies integrate real -time AI capabilities in their applications, OpenAi has added several new features to real -time API.
You can now admit MCP and recognize images of images, which allows users to inform what you see in real time. This is a characteristic that Google emphasized a lot during its Astra presentation of the project last year.
Real -time API can also handle the session initiation protocol (SIP). SIP connects applications to phones such as a public telephone network or desk phones, opening more use cases of the contact center. Users can also save and reuse indications in the API.
Until now, people are impressed with the model, although these are still initial evidence of a model that was recently launched.
OpenAI reduced GPT-Realtime prices by 20% at $ 32 per million audio input tokens and $ 64 for audio output tokens.