Tushar Mehrotra

We built a new CPU-only AI stack: Speech-to-text + language inference + real-time intent detection

Hey all — I'm one of the co-creators of an open-source inference stack we’ve been developing as part of our work at United We Care, a mental health and AI infra company.

We’ve tested this stack against Whisper (OpenAI), ElevenLabs, NVIDIA, and Meta’s public models — and it consistently outperforms on:

WER (Speech-to-text): 6.2% on noisy, multi-accent corpora

Language inference: Fast contextual reasoning beating Meta’s LLaMA models in our benchmarks

Intent detection: Real-time inference (<65ms latency on CPU)

Streaming input: Live video/audio support, no GPU needed

Built with a mix of AVX-512 vectorization, LoRA-edge compression, and Z-Phase self-supervised pretraining. Entire stack deploys on-prem, air-gapped, or in your local dev env.

We’re giving free access to the first 99 devs on our waitlist:

👉 https://tally.so/r/meG5RQ

Check out our product hunt listing: https://www.producthunt.com/products/shunya-labs-united-we-care?utm_source=other&utm_medium=social

Happy to share papers, benchmarks, repos (currently in private beta), and answer questions if helpful!

28 views

Add a comment

Replies

Be the first to comment