
Launched on December 19th, 2024
How do you validate an AI agent that could reply in unpredictable ways?
My team and I have released Agentic Flow Testing an open-source framework where one AI agent autonomously tests another through natural language conversations.
Every day I speak with AI teams building with LLM-powered applications and something is changing.
I see a new role is quietly forming:
The AI Quality lead as the quality owner.
LangWatch provides an easy, open-source platform to improve and iterate on your current LLM pipelines, as well as mitigating risks such as jailbreaking, sensitive data leaks and hallucinations.
Awesome to see more products launch in this field. One of the biggest challenges for companies to create AI solutions is having the right platform in place as a starting point. Can't wait to explore this product even more for my own team!
I highly recommend LangWatch to anyone looking to elevate their AI-generated content with precision and effectiveness.
We've used LangWatch for output monitoring and evaluation of our RAG application. I can't recommend it enough. We find value in iterative evaluation with tools like DSPy and RAGAS, to production optimization features like jailbreak detection + document & topic tracking, all with a great dashboard and UI. The team has built a great product. Plus the team is very responsive and helpful.
Helped me personally with my AI project. No More AI blackbox - powering decisions with insights. Helps to mitigate safety risks as well as to know where exactly the bot is hallucinating, therefore increases quality. Makes safe guarding it against malicious practices like jailbreaking possible. All in all, wonderful tool for anyone working with LLMs