
GPT-4o (“o” for “omni”) is our versatile, high-intelligence flagship model. It accepts both text and image inputs, and produces text outputs (including Structured Outputs). It is the best model for most tasks, and is our most capable model outside of our o-series models.
Hi everyone!
Voice is the future, and OpenAI's new audio models are accelerating that shift! They've just launched three new models in their API:
🎤 gpt-4o-transcribe & gpt-4o-mini-transcribe (STT): Beating Whisper on accuracy, even in noisy environments. Great for call centers, meeting transcription, and more.
🗣️ gpt-4o-mini-tts (TTS): This is the game-changer. Steerable voice output – you control the style and tone! Think truly personalized voice agents.
🛠️ Easy Integration: Works with the OpenAI API and Agents SDK, supporting both speech-to-speech and chained development.
Experience the steerable TTS for yourself: OpenAI.fm
Visla
Incredible upgrade from OpenAI—GPT-4o’s boost in speech-to-text accuracy is a big leap forward, and steerable TTS opens up so many creative and practical use cases. Voice agents just got a serious level-up. Can’t wait to see what devs build with this!
Impressed by the speech accuracy and API ease! 👀