OpenAI launches new voice intelligence features in its API

OpenAI's latest launch (May 2026) expands its Realtime API with three new models designed to transform voice from a simple interface into a proactive agent that can reason and take action in real-time.

1. The New Voice "Arsenal"

The core of this update features specialized models for different use cases:

GPT-Realtime-2: A flagship model featuring GPT-5-class reasoning that handles complex requests, tool usage, and natural interruptions in real-time.
GPT-Realtime-Translate: Supports over 70 input languages with live, conversational translation.
GPT-Realtime-Whisper: Provides ultra-low latency, streaming speech-to-text.

2. Advanced Capabilities for Developers

These models enable more "human" interactions through:

Audible Transparency: The AI can provide vocal updates while executing background tasks.
Configurable Performance: Developers can tune reasoning effort and context windows (up to 128K tokens) to balance latency with intelligence.
Emotional Nuance: Controllable tone and delivery allow for more empathetic or upbeat AI personas.

3. Safety & Hardware Strategy

This update aligns with a push toward ambient, hardware-integrated AI:

Integrated Safety: Real-time classifiers and pre-set voices are used to detect harmful content and prevent impersonation.
Hardware Ambitions: The models are designed to power upcoming, potentially screenless, AI-first hardware.

news

OpenAI launches new voice intelligence features in its API

1. The New Voice "Arsenal"

2. Advanced Capabilities for Developers

3. Safety & Hardware Strategy