LLMs change fast — GPT-4 updates silently, models vanish, and prompts break.
PromptPerf helps you stay ahead by testing a prompt across GPT-4o, GPT-4, and GPT-3.5, comparing outputs to your expected result using similarity scoring.
✅ 3 test cases per run, unlimited runs
✅ CSV export
✅ Built-in scoring
More models and batch runs coming soon. One feature per 100 users.
Built solo. Feedback welcome 🙏 promptperf.dev