Ollama's new official desktop app for macOS and Windows makes it easy to run open-source models locally. Chat with LLMs, use multimodal models with images, or reason about files, all from a simple, private interface.
Ollama v0.7 introduces a new engine for first-class multimodal AI, starting with vision models like Llama 4 & Gemma 3. Offers improved reliability, accuracy, and memory management for running LLMs locally.
Introduced with the release of Ollama's support for @GPT OSS, is Turbo; Ollama's privacy-first datacenter-grade cloud inference service.
Whilst it's currently in preview, the service costs $20/m, and has both hourly and daily limits. Usage-based pricing will be available soon. So far, the service only has gpt-oss-12b and gpt-oss-120b models, and works with Ollama's App, CLI, and API.