RTVI-AI is a new open standard for Real-time Voice and Video Inference. Open source reference JavaScript and React SDKs are available today, with iOS, Android and other platform SDKS coming soon.
I'm a fan of anything that enables builders to build better, faster, and more expressively. This seems promising in that regard. I know @kwindla and the Daily.co team have many decades of combined experience with the WebRTC community and other open source projects. It's exciting to see them setting a new standard for real-time AI inference.
From @kwindla:
---
Today we’re announcing an open standard for Real-time Voice and Video Inference: RTVI-AI.
The RTVI abstractions and data structures define how client applications communicate with inference services. These are the “real-time APIs” for use cases like:
- Voice chat with LLMs
- Enterprise voice workflows such as healthcare patient intake
- Video avatars and immersive experiences
- Voice-driven user interfaces
- Voice conversational apps for education, customer support, and games
- High-framerate image generation and streaming generative video
We’re shipping open source reference JavaScript and React SDKs today, with iOS, Android and other platform SDKS coming soon.
This first release has been several months in the making, and incorporates work and insights from Groq, Deepgram, fal, Cartesia, Cerebrium, Vapi, and Daily
With RTVI, a “hello world” voice-to-voice AI chat app in JavaScript is 21 lines of code.
If you want to build real-time AI applications, implement infrastructure for real-time inference, or implement your own SDKs that leverage the RTVI standard, you are more than welcome to join this project. We welcome all contributions and ideas!
Thanks, @rajiv_ayyangar! Really fun to see this on Product Hunt. We've been building a lot of real-time voice and video AI apps, and there's so much potential to do useful, interesting new things.
There's a live demo of here: https://demo.rtvi.ai/
And lots of good discussion on the Discord here: https://discord.com/invite/pipecat
Our goal with RTVI is to make it easy to build AI voice-to-voice and real-time video applications.
* Applications developers should be able to write code that can use any inference service.
* Inference services should be able to leverage open source for the complicated, client-side developer tooling needed for real-time multimedia.
* Any developer should be able to trivially stand up real-time AI infrastructure for small-scale use, testing, or prototyping.
@semanser Yes, in the example above, it sending both audio and video, but just receiving audio. It is possible to manipulate the video within pipecat (the server side) and send it back. We will have demo code for this shortly on github!
Yes, pipecat supports and defaults to Daily's WebRTC transport. So you get all the benefits of webrtc's low latency and Daily's Global Mesh-SFU infrastructure.
Wow, this is super exciting stuff! 🚀 Kudos to @kwindla and the Daily.co team for pushing the boundaries of real-time AI inference! I love how RTVI-AI opens up so many possibilities for builders to create innovative solutions.
The use cases listed are mind-blowing, especially voice chat with LLMs and immersive video experiences! Can't wait to see the iOS and Android SDKs roll out too. It’s great to know that it’s open-source, making it so accessible for developers!
Definitely looking forward to trying out that 21-line "hello world" app. This feels like just the beginning. Let’s build some awesome stuff together!
@kyrylosilin the goal of RTVI is to be able to write the client side code without worrying about the underlying infrastructure. The infrastructure in theory should be swappable.
The current RTVI implementation uses pipecat bots, which uses webrtc and the @dailyco infrastructure.
The daily.co infrastructure can manage 10s of millions of simultaneous calls and we have a global footprint, 15 geo locations around the world, namely, us-east, us-west, canada, london, frankfurt, middle-east, mumbai, singapore, seoul, sydney, capetown, saopaulo.
That being said, since RTVI is opensource, it’s possible to add other types of transports or services.
This sounds like a valuable tool for developers working with real-time voice and video. The open source approach and upcoming platform SDKs are impressive.
What is RTVI-AI? Is it a new way to use AI for real-time voice and video? Can developers use it easily with JavaScript and React now? Will there be tools for other platforms like phones soon?
@jpgohil93 Yes, we launched with the React and Web/JS SDK, today. We are working on iOS and Android SDKs, which will be announced shortly.
The way to think about this is RTVI is a the client-side implementation, which is open-source and can essentially connect to any server-side RTVI implementation. Today, the server-side implementation is pipecat.ai, which co-ordinates with the configured Speech-to-text, LLM, Text-to-speech.
Product Hunt