OpenAI Unveils Real-Time Audio Models for Conversational AI

tech360.tv
2 hours ago
1 min read

OpenAI is set to introduce three audio models for its developer platform on Thursday, May 7, 2026, which are designed to make voice-based software agents more conversational and capable of completing tasks in real time. The launch of the application programming interface (API) moves the ChatGPT-maker beyond transcription and chat, toward agents that can listen, translate, and act during live conversations.

Live translation waveform interface showing a transcript graph and text. Two people stand beside a laptop, discussing. Bright, modern setting. — Credit: OPENAI

These new models enable agents that can listen, translate, and act during live conversations. The models are GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper.

OpenAI stated the models are available to test in its developer playground. GPT-Realtime-2 manages harder requests, calls tools, handles interruptions, and maintains context across longer voice sessions.

GPT-Realtime-Translate supports translation from more than 70 languages into 13 output languages. This model targets customer support, education, and other settings.

GPT-Realtime-Whisper provides live speech-to-text functionality. This allows captions, meeting notes, and workflow updates to be generated as a speaker talks.

Customers testing the models include online real estate marketplace Zillow, online travel agency Priceline, and European telecommunications firm Deutsche Telekom.

Pricing for GPT-Realtime-2 starts at USD 32 per million audio input tokens. GPT-Realtime-Translate costs USD 0.034 per minute, and GPT-Realtime-Whisper is USD 0.017 per minute.

OpenAI introduced three new audio models: GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper.
These models are designed to enable real-time, conversational voice-based software agents.
Key functionalities include managing complex requests, translating over 70 languages, and live speech-to-text.

Source: REUTERS