David Karam

Pi - The ML & Data Science toolkit; built for Software Engineers.

Pi is a toolkit of 30+ AI techniques designed to boost the quality of your AI apps. Pi first builds your scoring system to capture your application requirements and then compiles 30+ optimizers against it - automated prompt opt., search ranking, RL & more.

Add a comment

Replies

Best
David Karam
Maker
📌
👋 Hi Product Hunt makers! I’m David, co-founder at Pi Labs. Our goal is to put the most advanced ML and data science techniques and algorithms in the hands of all software engineers, so that you can build AI applications at the same level of performance and sophistication as the big labs. We’re excited to announce our first product launch and keen to get feedback from the maker community. Pi can very quickly: 🔢 Build your scoring system to reliably measure your AI’s response quality. 💬 Auto optimize your prompt by running DSPy algorithms against your scoring system 🧾 Generate cheap high quality synthetic data based on seed input to train your model 📈 Compile high quality reward models to run RL algorithms like PPO and GRPO 🔄 Build your feedback loop by finetuning your scoring model with your user data 🔎 Rewrite your queries and customize your Ranking for your RAG backend … and more! What’s special about Pi’s technology? ** Pi is inspired by the MVC architecture for Web. Scorers on one side of the loop, Optimizers on the other side of it. You update your scorers, they auto-update your optimizers, keeping your loop in sync. All optimizers "compile" against the same scoring system. This means you just interface with scoring, letting Pi handle the algorithmic heavy lifting. ** Pi’s scoring models are small and fast encoder models trained specifically for scoring. They let you assess 20+ quality dimensions in sub 100ms. This means you can use them beyond just evaluation; from reward modeling for RL to online control flow with agents. ** Pi’s scoring models can be calibrated with human feedback. Manual calibration, labeled data, or preference pairs, your scoring system adjusts to your and your users’ preference creating robust feedback loops for your application. ** Pi’s playgrounds allow you to easily interact with even complex workflows like synthetic data generation or routing. When you’re done vibe checking any particular technique or algorithm, you can deploy in code to scale. Try Pi out, no sign in required at https://build.withpi.ai. Have Pi build your first scoring system in less than 2 minutes and start optimizing your AI right away. You can also visit https://code.withpi.ai for our API reference and links to end to end tutorials and notebooks that show you how to use those techniques in real-world examples. Excited to hear your feedback!
R. Fancsiki

This is really cool. I see so many uses. As soon as I will have a bit of time I will see how to integrate the scoring and comparison and such into my app using the node client and let it grow the data and then I will be able to improve on things.

Do you happen to have an idea about how much will this cost? I see things just work and you are paying for the AI in the playground for now, but obviously that is not sustainable :)



David Karam

@fancsiki thanks a ton for your feedback. We haven't sorted exact pricing yet, but for inference endpoints it'll probably be per-token pricing and for training, GPU hours. Playgrounds we see much more as ways to interact with the system vs. things that we'd price. So probably best model to think of is close to how you interact with OpenAI endpoints. We're still early on in the journey, so keen to hear any feedback. Feel free to reach out at david@withpi.ai when you wanna chat some more!

Jonas Urbonas

This tool sounds amazing for making complex systems easier to manage! How does Pi ensure the models stay efficient and scalable when working with large datasets, and are there any specific projects you've found it especially helpful for?

David Karam

@jonurbonas Thanks for the feedback Jonas! Re large datasets, the scoring system being really quick and cheap ensures that large datasets can always be filtered and pruned at low computation cost. Applying scoring to datasets also helps trim down their size, making training faster and cheaper at the same quality (lower volume but higher quality data help models converge faster and more accurately). So the methodology basically pushes a lot of the scaling to the pre-processing step, giving you higher guarantees. A side note also that the scoring system can act as a reward model which means you can start moving training pipelines towards algorithms like GRPO which are way more efficient, so while that does not technically improve your existing training process, it opens up the door to a much more scalable and efficient one.


Re specific projects,

** Built a scoring system calibrated to users' thumbs up/down for Gamma, ensuring everything they measure in their pipeline is predictive of their user feedback.

** Building custom ranking for another company that incorporated many domain signals beyond relevance for e-commerce (merchant trust, product popularity, etc.) using the same calibrated metrics tree approach.

** Quality control for a human-in-the-loop process where human labelers give feedback to the model, reducing the amount of time human intervention is needed.

** Back at Google Search, it was standard practice for keeping all our datasets fresh as product requirements evolve through the filtering mechanism mentioned.


Happy to chat some more so feel free to drop any questions; and thanks for your thoughts!

Fred Jonsson

I met @david_karam last summer the AI Engineer’s World Fair. At the time we chatted, I was stuck trying to optimize a prompt pipeline that just wouldn’t do what I wanted to do, and I was patching around it in the only kind of way I could with my software engineering toolkit – with iterative refinement, rules-based approaches and decomposition.


Of course it wasn’t going to work. I had ML-envy, and was eyeing all the fancy RL and finetuning that the ML engineers around me understood, and I didn’t know where to start with. I knew I needed the power of the tools they were experts at, but I couldn’t afford a month away from product development to go off and to find out how to build a scoring model.


And then David told me they were building that! High-level primitives and workflows for software engineers to have the power of ML expertise in their team. I can just download an API token and immediately have the primitives I need to focus on the product.


I’m super excited to start building with Pi. I love a leveraged tool that lets me focus on the product. The roadmap of things that are coming up looks awesome. Congratulations to the team on launching something truly useful right out the gate.

Nisarg Patel

SaaS founder here. How does Pi handle real-time feedback loops for continuous improvement? Can it adapt scoring models based on user interactions over time?

David Karam

@nisarg1212 Yes! That's what is special about these scoring models, they can be adapted based on labeled data as well as preference data. What will happen is that the weights in your metrics tree would get reweighted to reflect the importance your data is implicitly ascribing to the dimensions e.g. your users might really care about one thing vs. the other, so the tree weights shift more to that one thing. Given the size of the models we use, calibration happens in minutes, so you can retrain those scorers every hour if you want a very tight feedback loop. Feel free to reach out directly at david@withpi.ai if you want to learn more.

Rajiv Ayyangar

Nice work launching on Pi Day!
Forgive the naive question, but how does this stack up with RAG and other approaches people are using to improve models right now?

David Karam

@rajiv_ayyangar thanks for the warm wishes! There is a very natural evolution from the standard stack to weaving Pi into it. For example, if you have a RAG setup that works with basic Relevance matching but it is not capturing the nuances of your domain, you might want to build a custom Ranker (e.g. in a hypothetical ProductHunt RAG case, post relevance is not enough, you want launch popularity signal, user credibility signal, etc. to really tell what posts should rank higher than others). The same applies for optimizing AIs, you can start with manually writing your prompt, but if you have 1000 datapoints you would want to dynamically choose 10 to add as fewshot. Even when writing prompts, one can bootstrap with manually writing it, but then quickly move to auto-optimization against their metrics to get the best possible prompt without manual tweaking. So I would say that today, the starting point is the standard stack and all these algorithms layer on as your applications grows in maturity. Our hope is that by lowering the bar for entry, this approach becomes a Day 1 approach, similar to how React/Angular started as "only if your website is complex" and then became the obvious way to build from Day 1.

Cory Crapes

@david_karam well played, launching on March 14th.

David Karam

@cory_crapes appreciated <3

lee Jackson

Pi 6 has made my Raspberry Pi projects so much easier! 💡 Loving the simplified workflow. Like if you're all about seamless tech! Wishing you successful builds!