LLM Beefer Upper

LLM Beefer Upper

Automate Chain of Thought with multi-agent prompt templates

174 followers

Simplify automating critique, reflection, and improvement, aka getting the model to 'think before it speaks', for far superior results from generative AI. Choose from pre-built multi-agent templates or create your own with the help of Claude Sonnet 3.5.
LLM Beefer Upper gallery image
LLM Beefer Upper gallery image
LLM Beefer Upper gallery image
LLM Beefer Upper gallery image
LLM Beefer Upper gallery image
LLM Beefer Upper gallery image
Free Options
Launch Team / Built With

What do you think? …

Lee Mager
I built this app for myself because I was getting bored of having to hunt down my prompt templates and copy/paste them to take advantage of the chain-of-thought / critique / reflection / improvement boost from LLMs. It automates the multi-agent process as well as making it easy to add and refine prompt templates. Each agent is 'dedicated' to a task, e.g. accuracy verification, improvement suggestions, polishing off, and each one displays so you can see the AI 'showing its' working'. I use it constantly for tasks where I want the best results and I personally can't get enough of it. It's expensive ($0.75 for the best quality 4-agent run *** UPDATE following feedback, I've cut pricing by 33% so now the best quality run is $0.50) because it absolutely demolishes tokens, but for me it's a no brainer. Had a few people see it and basically demand I make it available publicly, so that's why it's here!
@leemager Great job! This looks revolutionary.
Kyrylo Silin
Hey Lee, I'm curious about the pre-built templates. What kinds of tasks or industries are they designed for? Given the high token usage, have you considered any options for reducing costs, like caching frequent requests? Congrats on the launch!
Lee Mager
@kyrylosilin Hey Kyrylo and thanks! I haven't yet experimented with potential cost-cutting, but it's definitely something I need to think about. But I'm firm on only using the best LLM because otherwise the main value of the app - getting the best possible result without manually prompting for the additional critique/reflection/improvement stages - would deteriorate. I've experimented with GPT4o mini, instead of Claude Sonnet 3.5 and the results are impressive relative to GPT4o mini, but not to the standard I'm used to. Here's an experiment I did recently for a really quite complex knowledge work task that GPT4o and Claude Sonnet 3.5 don't do well enough at (they miss key points), but adding the 4 agents to think through carefully made a massive improvement: https://llmbeeferupper.com/artic.... This is the kind of higher quality task the app is focused on and I personally wouldn't want to dilute it with a cheaper model for now. That said, the feedback I'm getting in the post-task survey results right now are pretty clear - 100% have said that the final agent's response is better than the first, but <50% say the cost is worth it. So I will need to think about making this cheaper for sure.
Lee Mager
@kyrylosilin Regarding the prebuilt templates, I'm always expanding them based on requests I get. I work in higher education so I have some like dissertation planning, exam question drafting, critical feedback on drafts, targerted study notes from an academic paper, marker guides for exam papers, curriculum and lesson plans etc. But also tasks like drafting blogs, application cover letters, project planning, risk analysis, FAQ generation etc. All the usual kinds of language tasks that LLMs are great at, and the multi-agent reasoning steps just make them a lot better :)
Lee Mager
I've also added an automation script template, things like python, powershell, vb. I won't be doing anything for software dev because the token length makes that impossible and it would only disappoint people. But for scripts of 200 lines or less that one works well. The day Claude (or GPT5) can handle millions of tokens and not suck, I will absolutely be adding some actual software dev templates!
blank
This is a fantastic initiative, @leemager! The way you've simplified the automation of critique and reflection in generative AI could be a game-changer for anyone looking to enhance their output. Love the idea of having dedicated agents for specific tasks like accuracy verification and improvement suggestions; it really mimics a more interactive and thoughtful creative process. I can see how this would seriously save time for many Makers and help elevate the quality of content generated. Plus, the transparency of seeing the AI 'show its working' is such a clever touch—definitely aligns well with the principles of iterative improvement! While the cost may seem high at first glance, the ROI on quality results and enhanced efficiency might just make it worthwhile for a lot of users. Excited to see more feedback once it's live!