Zac Zuo

Ollama v0.7 - Run leading vision models locally with the new engine

Ollama v0.7 introduces a new engine for first-class multimodal AI, starting with vision models like Llama 4 & Gemma 3. Offers improved reliability, accuracy, and memory management for running LLMs locally.

Add a comment

Replies

Best
Zac Zuo
Hi everyone! Ollama v0.7 is here, and it's a significant update focused on its new engine for multimodal AI. This is a big step for running powerful vision models locally with Ollama! With this new engine, Ollama now offers first-class, native support for vision models like Meta's Llama 4, Google's Gemma 3, and Qwen 2.5 VL. The aim is improved reliability, accuracy, and memory management when working with these complex models on your own machine. It also simplifies how new models can be integrated into Ollama. Beyond supporting current vision models, this update also lays the groundwork for Ollama to handle more modalities in the future, such as speech, image generation, and video. It's good to see Ollama expanding its core capabilities for advanced local AI.
Gowtham V

@zaczuo Just tried it out. Works great. Congrats on the launch. We also recently lauched FilesMagicAI which organized your macos Files automatically using AI. Do check it out and give us feedback. Thanks.

André J

I wonder. How does these OSS offline LLMs perform with agentic tasks?

Zac Zuo

@sentry_co It's doable, but I think the result depends on what hardware can handle. For most consumer setups, think smoothly running quantized 7B, maybe up to 13B models, if you want decent speed for agent stuff. If you've got a beefier consumer rig (like 24GB VRAM), it might push a ~30B model. And to smoothly execute agentic workflow, we definitely need something like LangChain to actually build the agent logic on top. Some interesting discussion here.

André J

@zaczuo I use Cline as my daily coding agent. It has support for Ollama. The agentic logic in cline is top notch. just need LLMs. I use gpt 4.1 and sonnet 3.7 for agentic stuff. People use Google gemini pro 2.5 for agentic stuff to. I will try it a bit later. O4 Mini is not good for agentic stuff for instance. too slow. gpt 4.1 mini is too light weight, gets stuck. What Sonnet 3.7 calls a Thinkng model is the best for agentic stuff I think. So thats what we need. Local Opensource thinking models for agentic tasks. MCP is the future 🚀

Supa Liu

For those of us who prefer the privacy and control of running LLMs locally, Ollama v0.7's enhanced engine with multimodal capabilities and improved stability makes it an even more compelling platform for exploring the latest AI advancements right on our own machines.

Nemo
This is so exciting. I’m excited to put it up against some OCR solutions
Ahad Ali

Congrats on the Ollama v0.7 update. It's exciting to see advancements in multimodal AI and local vision models. As you enhance AI capabilities, Tabby, my AI-driven bookkeeping app, could be a great tool for managing finances effortlessly. Looking forward to seeing how Ollama transforms local AI experiences!

William Jin

Running vision models locally could greatly enhance my workflow. How do you envision this impacting your projects?

CaiCai

Could Ollama officially provide a simpler client?

Joy Wang

I've been using Ollama v0.7 recently, and the upgrade is impressive—especially the new support for vision models like Llama 4. It runs more reliably on my local setup and handles memory far better than before.

Erliza. P

Local LLMs getting vision capabilities? The future of AI is looking more multimodal by the day.