Hi everyone!

OpenAI brings their next-generation reasoning models, o3 and o4-mini, and they come with two major advancements.

First, they introduce "Thinking with Images". These models don't just see images; they can apparently manipulate them (crop, zoom, rotate via internal tools) as part of their reasoning process to solve complex visual problems, even with imperfect inputs like diagrams or handwriting. This is pushing them to SOTA on multimodal benchmarks.

Second, building on that, they have full agentic tool access. They intelligently decide when and how to use all available tools: Search, Code Interpreter, DALL-E, and these new image manipulation capabilities. Often chaining them together to tackle multi-faceted tasks.

Quick model difference: (because the naming puzzle, right?:))

o3: The most powerful, particularly excels at complex visual reasoning.
o4-mini: Optimized for speed and efficiency, still very capable across tasks.

Both models have better instruction following and more natural conversation using memory. They're rolling out now in ChatGPT and available via the API. (OpenAI also released the open-source Codex CLI alongside).

Replies

Best

Zac Zuo

Ambassador

Hunter

📌

4mo ago

Jun Shen

Loving the visual reasoning features! 👍

Islam Akramov

The capabilities of o3 and o4-mini are seriously impressive—multimodal reasoning combined with tool usage takes things to a whole new level. Exciting to see how these models expand what's possible with real-time search, code execution, and visual understanding. Huge leap forward!

OpenAI o3 and o4-mini - Advanced Visual Reasoning & Agentic Tool Use

Replies

Engineering & Development

AI

Work & Productivity

Marketing & Sales

Design & Creative

Social & Community

Finance

Product add-ons

Trending categories

Top reviewed

Trending products

Top forum threads