Trieve Vector Inference - Deploy fast, unmetered embedding inference in your own VPC

Federico Chávez Torres

Trieve

TVI is an in-VPC solution for fast, unmetered embedding inference. Get fastest-in-class embeddings using any private, custom, or open-source models from dedicated embedding servers hosted in your own cloud. Battle-tested by billions of documents and queries.

Replies

Best

Federico Chávez Torres

Trieve

Maker

📌

Hello y'all, My name is Fede, I am the least technical member of Trieve and proud to announce the launch of our standalone embedding and reranking inference product, Trieve Vector Inference, on Product Hunt. We've been building AI applications together since late 2022. As we matured and eventually pivoted hard into building infrastructure, we quickly learned what we could and could not control. There were two major bottlenecks to being the performant end-to-end API we are today. The most important one of these was embedding and reranking inference. Building AI features at scale exposes two critical limitations of cloud embedding APIs: high latency and rate limits. Modern AI applications require better infrastructure. The platform supports any embedding model, whether it’s your own custom model, a private model, or popular open-source options. You get the flexibility to choose the right model for your use case while maintaining complete control over your infrastructure. We put together TVI to eliminate these bottlenecks for our own core product. It’s served billions of queries across billions of documents. After requests from others, we’ve sanded it down, wrote up some docs, and are now making it available for all. You can even get it on AWS Marketplace! Sincerely, Fede P.S If you're curious about the other bottleneck, we have a sister launch going on right now, today! as well for PDF2MD, a lightweight and powerful OCR service. Just click on our company profile to check it out (and support it!)

Report

9mo ago

Nevo David

Postiz

Amazing product, I love that it's open-source :)

Report

9mo ago

Rodrigo Mendoza-Smith

This problem is so Trieve! As I read about "solving bottlenecks" and "building fast APIs for embedding and reranking inference", I couldn't think of any other team that could be behind this. I'm really curious to know how you made the reranking inference so quick—I'll be checking out your repo soon :)

Report

9mo ago

Federico Chávez Torres

Trieve

Maker

@r0dms hahaha thank you, yes it's very "Trieve" indeed. It works extremely well and it's something we're looking to push heavily. It's crazy how much you lose to cloud products in regards to control, quality, and speed.

Report

9mo ago

Huzaifa Shoukat

Congrats on the launch of TVI! This looks like a game-changer for embedding inference in the cloud. How do you handle scaling and pricing for different use-cases?

Report

9mo ago

Federico Chávez Torres

Trieve

Maker

@ihuzaifashoukat Thanks! This is a kube-based product so it handles super well at scale. In terms of pricing, it's a flat licensing fee of $500. What compute you chose to spend on is up to you! There's a lot of good information in our write-up about it here: https://trieve.ai/tvi-blog

Report

9mo ago

Charlotte Babbage

the good [the best]

Report

9mo ago