Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, Understands text, images, audio & video; generates text & natural streaming speech.
You can now use Voice and Video Chat directly in Qwen Chat! Powering these new multimodal interactions is Qwen's latest open-source model: Qwen2.5-Omni.
This "omni" model is a single system that understands text, audio, images, and video, while outputting both text and natural-sounding audio.
Key aspects:
🔄 End-to-End Multimodal: A single "Thinker-Talker" architecture designed for seamless input/output across modalities. 💬 Real-Time Interaction: Built for streaming, enabling smooth voice and video chat experiences. 🗣️ Natural Speech Output: Claims strong performance in speech generation quality. 💪 Strong Across Modalities: Performs well on benchmarks for vision, audio, and text tasks. 🔓 Openly Available with Apache 2.0 license: Released on Hugging Face, ModelScope, and GitHub, with API access via DashScope.
The Qwen team believes this type of omni model is key for the future of AI agents. While this is still just the 7B version, it's impressive to see this level of multimodality in an open model.
Head over to Qwen Chat, toggle the new voice & video chat button, and experience it!
It's very interesting ai! Congratulations on the launch! There are three my favorite ai : Qwen, Deepseek and ChatGPT. And two of three is free. I wish you to grow up!
Replies
Hi everyone!
You can now use Voice and Video Chat directly in Qwen Chat! Powering these new multimodal interactions is Qwen's latest open-source model: Qwen2.5-Omni.
This "omni" model is a single system that understands text, audio, images, and video, while outputting both text and natural-sounding audio.
Key aspects:
🔄 End-to-End Multimodal: A single "Thinker-Talker" architecture designed for seamless input/output across modalities.
💬 Real-Time Interaction: Built for streaming, enabling smooth voice and video chat experiences.
🗣️ Natural Speech Output: Claims strong performance in speech generation quality.
💪 Strong Across Modalities: Performs well on benchmarks for vision, audio, and text tasks.
🔓 Openly Available with Apache 2.0 license: Released on Hugging Face, ModelScope, and GitHub, with API access via DashScope.
The Qwen team believes this type of omni model is key for the future of AI agents. While this is still just the 7B version, it's impressive to see this level of multimodality in an open model.
Head over to Qwen Chat, toggle the new voice & video chat button, and experience it!
Flex-Worthy Templates
Alibaba is shipping fast
Excited about multimodal AI’s potential! 👀
Hoping I can explore even more advanced speech synthesis in future updates .
free , Now let's see who is truly "open" AI
7B multimodal awesome!!!
This will make my workflow smoother when working across multiple formats.