Harshil Siyani

Still manually tweaking AI prompts & praying for the best?

I've completed my proof-of-concept for the AI evaluation tool - successfully testing prompts across configs (Models x Temps x Runs). https://promptperf.dev/

Hi, I am building a AI Model Prompt Optimising tool that will run through multiple AI models across various temperature settings for X number of runs.
This saves time doing multiple tests, example user wants to test 3 questions/prompts for an expected answer across 4 models and the test would include 3 temperature options, each test to be done atleast 10 times to ensure consistency.
3x4x3x10 = 360 API calls + Reporting and Analysis of each result. Now imagine doing this multiple times for multiple prompts every-time, this can be 100 calls, or 1000 calls based on your requirements.

How It Works:
1. Define Your Prompt

  • Input the prompt you want to optimize—whether it's a question, instruction, or complex task for the AI.

2. Set Expected Output

  • Specify what an ideal response looks like—this becomes the benchmark against which all AI outputs are compared.

3. Select Models

  • Choose which AI models you want to evaluate from our supported options like GPT-4o, Claude, Gemini, and more.

4. Configure Settings

  • Set the temperature values and number of runs for each configuration to test various creativity vs. precision balances.

5. Run Tests

  • PromptPerf automatically executes all tests across your specified models, temperatures, and run counts.

6. Analyze Results

  • Review comprehensive performance data showing which configuration delivers the best match to your expected output.

I am looking for feedback and early testers who would be interested, I have done a proof of concept and about 1-2 weeks away from early access launch/MVP
https://promptperf.dev/

9 views

Add a comment

Replies

Be the first to comment