Find and fix AI mistakes at scale, and build more reliable GenAI applications. Use our LLM-as-a-Judge to test and evaluate prompts and model versions.
Find and fix AI mistakes at scale, and build more reliable GenAI applications. Use our LLM-as-a-Judge to test and evaluate prompts and model versions.
Selene by Atla
Hey Product Hunt! Maurice here, CEO and co-founder of Atla.
At Atla, we’re a team of researchers and engineers dedicated to training models and building tools that monitor AI performance.
If you’re building with AI, you know that good evals are critical to ensuring your AI apps perform as intended.
Turns out, getting accurate evals that assess what matters for your use case is challenging. Human evaluations don’t scale and general-purpose LLMs are inconsistent evaluators. We’ve also heard that default eval metrics aren’t precise enough for most use cases, and prompt engineering custom evals from scratch is a lot of work.
🌖 Our solution
Selene 1: a LLM Judge trained specifically for evals. Selene outperforms all frontier models (OpenAI’s o-series, Claude 3.5 Sonnet, DeepSeek R1, etc.) across 11 benchmarks for scoring, classifying, and pairwise comparisons.
Alignment Platform: a tool that helps users automatically generate, test, and refine custom evaluation metrics with just a description of their task, little-to-no prompt engineering required.
🛠️ Who is it for?
Builders of GenAI apps who need accurate and customizable evals—whether you’re fine-tuning LLMs, comparing outputs, or monitoring performance in production. Evaluate your GenAI products with Selene and ship with confidence.
You can start with our API for free. Our Alignment Platform is available for all users.
We’d love your feedback in the comments! What challenges have you faced with evals?
Selene by Atla
@masump Hey Masum! Selene won't adapt itself out of the box, but we've built the alignment platform to make it easy to continually align your LLM judge to changing requirements.
Fedica
Too many AI tools, outputs, and constant tweaks can be a lot especially when you're racing to launch. Having precise evaluations without handcrafting endless prompts sounds like a dream. I like the idea of freeing myself up to actually focus on strategy instead of chasing down inconsistencies. Super intrigued!
Selene by Atla
@jonwesselink Thank you Jon! Would be excited to help you with evals at Fedica!
This is actually super interesting, and I'll check it out!
Selene by Atla
@mia_k1 Thank you! Let us know if you have any questions