AutoArena

AutoArena

Automated GenAI evaluation that works

5.0
1 review

69 followers

AutoArena is an open-source tool that automates head-to-head evaluations using LLM judges to rank GenAI systems. Quickly and accurately generate leaderboards comparing different LLMs, RAG setups, or prompt variations—Fine-tune custom judges to fit your needs.
AutoArena gallery image
AutoArena gallery image
AutoArena gallery image
AutoArena gallery image
AutoArena gallery image
Free
Launch Team

What do you think? …

Geva Perry
Hunter
📌
Hey Product Hunt! 👋 I'm thrilled to introduce AutoArena! It's an open-source tool that takes the hassle out of comparing generative AI systems, making the process faster, cheaper, and more accurate. AutoArena uses LLM judges to automate and scale head-to-head evaluations, allowing you to quickly generate leaderboards for different GenAI models, RAG setups, or even specific prompt tweaks. Whether evaluating multiple models or experimenting with new prompt variations, AutoArena can help you clearly understand which setups perform best—without manual scoring. Plus, you can fine-tune your own LLM judges to tailor evaluations to your unique requirements. I’m excited to see how this will help the community push the boundaries of generative AI! And congrats to the team: @gordonhart, @moelgendy, @andrewshishi, and @skip_everling. Give it a try, and I’m sure the team would love any feedback, thoughts, or ideas for new features! You can download it from the Github repo and install it locally, or sign up for the hosted version at https://www.autoarena.app. Check it out and let's automate those evals! 🤖🎉
Ryan Hendrickson
@gevaperry Super cool guys! I like the fact that you can fine-tune the LLM based on the evaluations! Just as a quick heads up, I tried to go to your website through the link in your comment, but it brought me to a not found page. Looks like it should be https://www.autoarena.app/ instead of .com. I wish you the best of luck with your launch!
Geva Perry
@ryan_hendrickson thanks so much! Link was fixed.