MARS5 TTS

MARS5 TTS

Open-source, insanely prosodic text-to-speech (TTS) model

5.0
1 review

570 followers

MARS5 an open-source TTS model to replicate performances (from 2-3s of audio reference) in 140+ languages, even for extremely tough prosodic scenarios like sports commentary, movies, anime & more. Join our Discord https://discord.com/invite/ZzsKTAKM today!
MARS5 TTS gallery image
MARS5 TTS gallery image
Free
Launch Team / Built With

What do you think? …

Akshat Prakash
CAMB.AI introduces MARS5, a fully open-source (commercially usable) TTS with break-through prosody and realism available on our Github: https://www.github.com/camb-ai/m... Why is it different? MARS5 is able to replicate performances (from 2-3s of audio reference) in 140+ languages, even for extremely tough prosodic scenarios like sports commentary, movies, anime and more; hard prosody that most closed-source and open-source TTS models struggle with today. We're excited for you to try, build on and use MARS5 for research and creative applications. Let us know any feedback on our Discord! Highlights: Training data: Trained on over 150K+ hours of data. Params: 1.2 Bn (750/450) Multilingual: Open-sourcing in English to begin with, but can access it in 140+ languages on camb.ai Diversity in prosody: can handle very hard prosodic elements like commentary, shouting, anime etc.
Max Savonin
@akshat_prakash2 Hi CAMB.AI Team, Just saw MARS5 on Product Hunt and the focus on open-source, prosody replication, and multilingual capabilities is exciting! It sounds like a valuable tool for developers and creatives seeking more expressive text-to-speech solutions. A couple of questions piqued my interest: - How does MARS5's performance (audio quality, naturalness) compare to existing open-source and closed-source TTS models, particularly for challenging prosody? Are there plans for easy integration with existing development tools? - Does MARS5 offer any user control over the prosodic aspects of the generated speech? What are your plans for future development, such as expanding language support or offering different "voice styles"? My background in AI development might be valuable for MARS5's development in areas like: - Exploring ways to further enhance MARS5's audio quality, naturalness, and robustness for various prosodic scenarios. - Developing user-friendly tools and libraries that streamline integration of MARS5 into various development workflows. - Investigating ways for users to customize speech prosody and potentially offer different voice styles or emotional variations. I'd love to learn more about MARS5's technical details and roadmap. I'm eager to see how my skills can contribute to making it the leading open-source TTS solution for replicating diverse prosody across languages. Thanks for launching such an innovative project! Best regards, Max
Akshat Prakash
@max_savonin1 Hey Max, Thanks for the questions and the interest shown! - MARS5 is a significant step ahead of most opensource and closed source TTS models in its ability to capture prosody for tough scenarios like sports, anime, movies etc. MARS is already integrated into CAMB.AI's paid offerings such as -- DubStudio (studio.camb.ai), APIs (docs.camb.ai) and our chrome extension (https://chromewebstore.google.co...) - MARS5 replicates the reference speech given, so yes, it allows for controlling prosody in that manner. Since it's auto-regressive, you can generate a few times until you find one that's best, and we're baking in more inference techniques for higher quality and stability very quickly into the release. - If you're interested in contributing and joining the team, definitely do make atleast 1 pull request into our repo and feel free then to email us. We're hiring!
Frank Denbow
@akshat_prakash2 looks impressive. Would love to see more of the interface and features before having to sign up. Do you envision implementing this at the platform level for video sites? More thoughts
Akshat Prakash
@frank_denbow hey Frank, I see you're a fellow Carnegie Mellon homeboi; thanks so much for the review. I'm class of SCS as well. GO TARTANS!!!
Luc-Rikardo Fils
@akshat_prakash2 Looks fly! Would love to connect!
Charlotte Hare
Really love the look and feel of the website. It's also really easy to use which is great! When using the TTS generator (with my voice) I couldn't really replicate anything similar to what I sound like with 2 minutes of studio quality audio uploaded (maybe I needed more). However I think open sourced work is the way forward so appreciate you releasing the code and looking forward to see advancements in this field :D
Akshat Prakash
@charlotte_hare do join our Discord and let us know how we can help. Thanks for trying it out!
Muhammad Noval
Amazing idea from the team, can’t wait to see it on top
Akshat Prakash
@muhammadnoval1 Thank you noval!!!