
New OpenAI audio models for developers: gpt-4o powered speech-to-text (more accurate than Whisper) and steerable text-to-speech. Build voice agents, transcriptions, and more.
GPT-4 is the latest milestone in OpenAI’s effort in scaling up deep learning. GPT-4 is a large multimodal model (accepting image and text inputs, emitting text outputs) that exhibits human-level performance on various professional and academic benchmarks.
GPT-4.5 is our most advanced model yet, scaling up unsupervised learning for better pattern recognition, deeper knowledge, and fewer hallucinations. It feels more natural, understands intent better, and excels at writing, programming, and problem-solving.
A new set of APIs and tools specifically designed to simplify the development of agentic applications.
GPT-4o (“o” for “omni”) is a step towards much more natural human-computer interaction—it accepts as input any combination of text, audio, and image and generates any combination of text, audio, and image outputs.