This is such a long-awaited feature! The potential to integrate low-latency, scalable voice agents into websites and apps is exciting! You guys are always killing it in the tts space!!
Just tested it out. IMO the voices sound a little too sterile and it removes an aspect of realism. I was actually just on the phone with a handful of real phone agents this morning and here are some differences I noticed:
1. Background noise & traditional phone noises (movement of the phone, static from the receiver, etc.) - This helps provide a sense of realism and adds context to an audio conversation.
2. Varied pace of speech - When the human agent was searching for something or thinking their speech would elongate like "Hmmm... Leeeeet meee seee whaaat Iiii can ffffind" and then if they found it their speech would rapidly increase "Okay, I see it here now it's XYZ."
These are both things that would make an AI voice nigh indistinguishable from a real voice. Because the key is that it's not JUST a voice, there's a human behind the voice with certain tendencies, vocal errors, speech patterns, etc. this is what AI is missing.
Having gone down this route for a couple of different business applications I can attest to the challenges that underlie the task you've made apparently so simple.
What options are there for in bed functionality for websites? I saw a button on the video but are there other textures for clients to choose to interact w the voice agents?