Pi is a toolkit of 30+ AI techniques designed to boost the quality of your AI apps. Pi first builds your scoring system to capture your application requirements and then compiles 30+ optimizers against it - automated prompt opt., search ranking, RL & more.
Replies
This is really cool. I see so many uses. As soon as I will have a bit of time I will see how to integrate the scoring and comparison and such into my app using the node client and let it grow the data and then I will be able to improve on things.
Do you happen to have an idea about how much will this cost? I see things just work and you are paying for the AI in the playground for now, but obviously that is not sustainable :)
@fancsiki thanks a ton for your feedback. We haven't sorted exact pricing yet, but for inference endpoints it'll probably be per-token pricing and for training, GPU hours. Playgrounds we see much more as ways to interact with the system vs. things that we'd price. So probably best model to think of is close to how you interact with OpenAI endpoints. We're still early on in the journey, so keen to hear any feedback. Feel free to reach out at david@withpi.ai when you wanna chat some more!
I met @david_karam last summer the AI Engineer’s World Fair. At the time we chatted, I was stuck trying to optimize a prompt pipeline that just wouldn’t do what I wanted to do, and I was patching around it in the only kind of way I could with my software engineering toolkit – with iterative refinement, rules-based approaches and decomposition.
Of course it wasn’t going to work. I had ML-envy, and was eyeing all the fancy RL and finetuning that the ML engineers around me understood, and I didn’t know where to start with. I knew I needed the power of the tools they were experts at, but I couldn’t afford a month away from product development to go off and to find out how to build a scoring model.
And then David told me they were building that! High-level primitives and workflows for software engineers to have the power of ML expertise in their team. I can just download an API token and immediately have the primitives I need to focus on the product.
I’m super excited to start building with Pi. I love a leveraged tool that lets me focus on the product. The roadmap of things that are coming up looks awesome. Congratulations to the team on launching something truly useful right out the gate.
SaaS founder here. How does Pi handle real-time feedback loops for continuous improvement? Can it adapt scoring models based on user interactions over time?
Product Hunt
Nice work launching on Pi Day!
Forgive the naive question, but how does this stack up with RAG and other approaches people are using to improve models right now?
@rajiv_ayyangar thanks for the warm wishes! There is a very natural evolution from the standard stack to weaving Pi into it. For example, if you have a RAG setup that works with basic Relevance matching but it is not capturing the nuances of your domain, you might want to build a custom Ranker (e.g. in a hypothetical ProductHunt RAG case, post relevance is not enough, you want launch popularity signal, user credibility signal, etc. to really tell what posts should rank higher than others). The same applies for optimizing AIs, you can start with manually writing your prompt, but if you have 1000 datapoints you would want to dynamically choose 10 to add as fewshot. Even when writing prompts, one can bootstrap with manually writing it, but then quickly move to auto-optimization against their metrics to get the best possible prompt without manual tweaking. So I would say that today, the starting point is the standard stack and all these algorithms layer on as your applications grows in maturity. Our hope is that by lowering the bar for entry, this approach becomes a Day 1 approach, similar to how React/Angular started as "only if your website is complex" and then became the obvious way to build from Day 1.
Fable Wizard
This tool sounds amazing for making complex systems easier to manage! How does Pi ensure the models stay efficient and scalable when working with large datasets, and are there any specific projects you've found it especially helpful for?
@jonurbonas Thanks for the feedback Jonas! Re large datasets, the scoring system being really quick and cheap ensures that large datasets can always be filtered and pruned at low computation cost. Applying scoring to datasets also helps trim down their size, making training faster and cheaper at the same quality (lower volume but higher quality data help models converge faster and more accurately). So the methodology basically pushes a lot of the scaling to the pre-processing step, giving you higher guarantees. A side note also that the scoring system can act as a reward model which means you can start moving training pipelines towards algorithms like GRPO which are way more efficient, so while that does not technically improve your existing training process, it opens up the door to a much more scalable and efficient one.
Re specific projects,
** Built a scoring system calibrated to users' thumbs up/down for Gamma, ensuring everything they measure in their pipeline is predictive of their user feedback.
** Building custom ranking for another company that incorporated many domain signals beyond relevance for e-commerce (merchant trust, product popularity, etc.) using the same calibrated metrics tree approach.
** Quality control for a human-in-the-loop process where human labelers give feedback to the model, reducing the amount of time human intervention is needed.
** Back at Google Search, it was standard practice for keeping all our datasets fresh as product requirements evolve through the filtering mechanism mentioned.
Happy to chat some more so feel free to drop any questions; and thanks for your thoughts!
RAEK
@david_karam well played, launching on March 14th.