OpenAI Unveils Real-Time Audio Models for Conversational AI
- tech360.tv

- 2 hours ago
- 1 min read
OpenAI is set to introduce three audio models for its developer platform on Thursday, May 7, 2026, which are designed to make voice-based software agents more conversational and capable of completing tasks in real time. The launch of the application programming interface (API) moves the ChatGPT-maker beyond transcription and chat, toward agents that can listen, translate, and act during live conversations.

These new models enable agents that can listen, translate, and act during live conversations. The models are GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper.
OpenAI stated the models are available to test in its developer playground. GPT-Realtime-2 manages harder requests, calls tools, handles interruptions, and maintains context across longer voice sessions.
GPT-Realtime-Translate supports translation from more than 70 languages into 13 output languages. This model targets customer support, education, and other settings.
GPT-Realtime-Whisper provides live speech-to-text functionality. This allows captions, meeting notes, and workflow updates to be generated as a speaker talks.
Customers testing the models include online real estate marketplace Zillow, online travel agency Priceline, and European telecommunications firm Deutsche Telekom.
Pricing for GPT-Realtime-2 starts at USD 32 per million audio input tokens. GPT-Realtime-Translate costs USD 0.034 per minute, and GPT-Realtime-Whisper is USD 0.017 per minute.
OpenAI introduced three new audio models: GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper.
These models are designed to enable real-time, conversational voice-based software agents.
Key functionalities include managing complex requests, translating over 70 languages, and live speech-to-text.
Source: REUTERS


