👋 Hi Product Hunt makers! I'm Dhruv, founding team member at Pi Labs. We've heard from all our clients that the hardest problem in evals is not the tooling or the workflow, it's knowing what good looks like and how to measure it. We're excited to announce our second product launch to solve exactly this, the Pi Copilot, and are keen to get feedback from the maker community.

• The copilot builds your first set of eval metrics within seconds, rather than requiring you and your team taking a bunch of time to brainstorm metrics. Instead of endlessly iterating on prompts to make your LLM as a Judge "work", watch the copilot write you qualitative checks and python code for more objective metrics

• These evals use our proprietary Pi Scorer language models - small and fast encoder models trained specifically for scoring that let you assess 20+ quality dimensions in sub 100ms - to provide faster, more consistent scoring than LLM as a judge

• Pi's scoring models can be calibrated with human feedback. Manual calibration, labeled data, or preference pairs, your scoring system adjusts to your and your users' preference creating robust feedback loops for your application.

• Because our scorers are so fast and lightweight (sub 100ms), they can be used beyond just evaluation; from reward modeling for RL to online control flow with agents

Try Pi out, no sign in required at https://withpi.ai. Have Pi build your first scoring system in seconds and start optimizing your AI right away. You can also visit https://code.withpi.ai for our API reference and links to end to end tutorials and notebooks that show you how to use those techniques in real-world examples.

Reviews of Pi

Johanna H

•1 review

I’ve worked with the team behind this and can vouch for their technical brilliance. This scoring system is flexible, precise, and incredibly useful for evaluating creative or AI-driven workflows, including my multi-agent setup. Highly recommend!

Report

24d ago

Ajay Sahoo

Launching soon!

•268 reviews

The developers' team has a better review process, as it aids in smooth checks by eliminating the hit-and-trial method. This approach enables deterministic evaluations in minutes, eliminating the need for more guesswork about whether model improvements are actually working.

Report

22d ago

Pi

ML & Data Science toolkit; built for Software Engineers.

Pi Copilot

Pi Launches

Do you use Pi?

Review Pi?

Reviews of Pi