Zac Zuo

Gemma 3 - Build with multimodal AI from Google

by

Gemma 3 is Google's new models for multimodal AI (text, images, video). 1B-27B sizes, 128K context, 140+ languages. Includes ShieldGemma 2 for safety.

Add a comment

Replies

Best
Zac Zuo
Hunter
📌

Hi everyone!

Check out Gemma 3, Google's latest family of models for building multimodal AI applications! This is a big step up from the previous Gemma versions, adding video understanding and a much larger context window.

Key features:

🖼️ Multimodal: Handles text, images, and short videos.
🧠 Multiple Sizes: Available in 1B, 4B, 12B, and 27B parameter versions.
↔️ 128K Context Window: A major increase, allowing for processing much more information.
🌍 Multilingual: Supports over 35 languages out-of-the-box, pretrained on over 140.
🛠️ Integrates with Hugging Face Transformers, Ollama, JAX, Keras, PyTorch, Unsloth, vLLM, and Gemma.cpp.
🛡️ It Includes a separate 4B model, ShieldGemma 2, for image safety classification.
⚡ Optimized for NVIDIA GPUs, Google Cloud TPUs, and AMD GPUs.

Gamma 3 is a clear sign of how quickly the multimodal AI space is advancing.

Let's start exploring its capabilities in Google AI Studio!

Rami
can we run this on our own device or is it available on via the API?
Zac Zuo
Hunter

@kingromstar Yes, you can run it locally, the 1B and 4B models are designed for that. Plz check out this.

Rohan Gayen

@kingromstar @zaczuo What is the use case of 1B model?

Henry Habib

Google is not stopping. This is a solid addition to the multimodal space and makes me wonder what cool stuff could be a good starting point to build with it.

Stain Lu

I have to say Gemini 2 powered by Gemma3 is just incredible in its multimodality capability! The image generating and editing in the chat interface blew me away - I was able to create and modify images right in the conversation flow without switching between tools and uploading and downloading repeatingly,, This kind of seamless integration between text and visual creation is exactly what I've been waiting for in AI assistants. The quality and speed of the image generation is impressive too, much more responsive than other multimodal models I've tried. grats on the launch!

Mike Staub

It looks like Google is going to win the AI race.

Tim Chosen
This Sounds interesting, will give it a spin