Najwa Assilmi

Latency, cost, accuracy: pick two?🐰

by

We asked Gemini 2.5 and Claude 3.7 the same brain-twister:
“If Alice is twice as old as Bob was…” (you know the one 👵👦)



Both answered right.
But here’s what we’re wondering 👇

When you’re looking at LLM performance, what metric should come first?

  • Latency?

  • Token usage?

  • Cost?

  • Hallucination risk?

  • Just… vibes?

We’re building a monitoring layer on Intura to make this easy (and kinda fun).

What would you want to see first when your AI goes rogue?

Drop it in the replies 👇
#LLM #Monitoring #AItools #PromptEngineering #Intura

Add a comment

Replies

Best
Charles Maddock
Personally, I would always place an anthropic model facing the user since they just have such a superior way of understanding subtle nuances of the user’s request! However, for anything happening behind the scenes I would consider just using the cheapest model that can get the work done in a satisfying manner, in many cases, GPT 40 or Gemini 2.0 works fine