LLMs change fast ā GPT-4 updates silently, models vanish, and prompts break.
PromptPerf helps you stay ahead by testing a prompt across GPT-4o, GPT-4, and GPT-3.5, comparing outputs to your expected result using similarity scoring.
ā 3 test cases per run, unlimited runs
ā CSV export
ā Built-in scoring
More models and batch runs coming soon. One feature per 100 users.
Built solo. Feedback welcome š promptperf.dev