Nika

Do you trust any particular LLM model? (Your preferences over other AI solutions)

Yesterday, Meta announced that they have released a new collection of AI models, Llama 4, in its Llama family.

(It consists of Llama 4 Scout, Llama 4 Maverick, and Llama 4 Behemoth.)


Historically, Open AI with its ChatGPT has been on the market for the longest period.

It has the biggest share (77.66%) and people are somehow used to it.


DeepSeek also experienced a huge boom but there were concerns about data because of China's origins.


Some people are inclined to Grok tho they are also beginning to doubt whether Musk will start applying some form of censorship and proofreading of information.


Gemini, Perplexity & Claude are also frequently cited models, but I don't feel they are used that much.


Which LLM model do you use, trust and why have you decided on that particular LLM? 

4.9K views

Add a comment

Replies

Best
Sergei Zotov

The only LLM I truly trust is the one that works solely on my computer without an Internet connection.


Currently, my stack is:

  • General: Microsoft's phi4

  • Coding: Qwen2.5-Coder:32B

  • Pure instruction-following: Qwen2.5:32B

  • Reasoning (I don't use it much, but...): Deepseek R1:7B

Nika

@zot Yeah, I have heard that more people rather run them locally. Understandable.

Konrad S.

@zot there may be an alternative to that now: @Privatemode AI , I think they'll have a Llama 4 soon

Sergei Zotov

@konrad_sx judging by what they tell on the website, I'm very doubtful about this technology. There are ways to do this correctly though. I bet they'll pivot to that in the future

Konrad S.

@zot how could it be done better? The only possibility I see would be fully homomorphic encryption, and that's much too slow currently.

Sergei Zotov

@konrad_sx I believe in on device/self-hosted data obfuscation/masking.


For example, a person sells their property to a company, and the overall contract is kinda basic. But we just don't want to send personal data to a 3rd party app:


  • We need to define what data we should mask. Let's say it's this:

person: Sergei Zotov

address: 9351958120 Maple Avenue, Apt 12B, Chicago, IL 60614, USA

company: Apple Inc.

  • We can replace all of this on device with something like this:

person: John Doe

address: 456 Elm Street, Suite 3, Los Angeles, CA 90001, USA

company: Monsters, Inc.

  • Send the obfuscated document into any API we want

  • Convert the obfuscated data in the processed document to the previous values on device

We can also do it with numbers (e.g. apply multiplier/divider).


Btw, this obfuscation can be done even with basic text models, so it won't take much of the resources on device / in a self-hosted solution

André J

@zot When computers get 1TB ram, and 64 core cpus and h100 graphics card. I will join you, I do think the cloud models will always outpace local maschines tho 😅

Sergei Zotov

@sentry_co yeah, to get the performance of GPT-4.5 or Claude 3.7 you really need all of that 😅


But I think we're headed from a freakishly-large-model-that-knows-everything to smaller models for certain use cases.


For example, Qwen2.5-Coder:32B had almost similar performance to 4o in coding when it was first introduced. And it only takes ~20 GB GPU memory to run it. Any MacBook Pro with Apple Silicon and 32GB RAM can run it pretty well, and if not, you can always use the model with fewer parameters (7B and 14B also are good)


The same way I feel about phi4 - yeah, it can't work well across multiple languages, but it's a pretty good model for general use cases in English. Etc. etc.


In the future, I believe we're going to have some sort of router system that will just pull the best model possible for your query. It will preserve your GPU RAM and will get you similar results to proprietary models. Actually, that's a cool product idea...

André J

Cursor has a router built in now. It uses the appropriate model for the task at hand.


I think local, will be very interesting soon. Esp since OpenAI is not opening their latest image model for api access. Probably because it will cause havoc in many industries, because its so good. But local models cant be far behind, so that means the havoc will come anyways. Its just that companies cant facilitate for it, so people will have to instlal these things them selfs. Which opens up huge opportunities to create the cursor for local, or the local AI photoshop etc.


And then there is MCP. And agentic flows. All thrive locally, and requires basic hardware.


Good times ahead!

Stain Lu
  • for most of the cases we use claude 3.5/3.7 sonnet which is stable, fast and smart. tbh in creative writing, 3.5 and 3.7 seem not much difference.

  • we are also playing around with gemini 2.5 pro which is sometimes significantly better than claude models in longer writing, but the problem is that it is also significantly slower

  • for even harder tasks that require reasoning, our default choice goes to deepseek r1 instead of openai o series, because the later one only excels in benchmarks, but the answer just seem so ‘over-aligned‘ and often only gives us long but useless bullshit. r1 feels much smarter actually, i think the reason is in the data side.

  • lastly, please allow me to advertise Grimo (Cursor for Writing) , the only place where you can easily switch among all those latest models for FREE, even reasoning models like deepseek r1 and openai o3-mini, all in a pretty nice editor where you literally can quote any part of the doc (chatgpt canvas alternative maybe).


Stain Lu

and btw llama4 is a fraud, performance even worse than llama3

Mina Cheragh

@stainlu 

Oh really? Was about to test it. What scenarios you've used to test Llama4's performance?

Stain Lu

@mina_cheragh structured output, command understanding, and creative writing

Mina Cheragh

@stainlu 

It's a pretty solid test.

Any AI models you prefer for executing these tasks?

Nika

@stainlu I like the idea behind Grimo. Yesterday or the day before I saw a post from @mina_cheragh and she also mentioned the option to store LLMs in one place.

Stain Lu
@busmark_w_nika glad you like it! actually we consider models as just a parameter of certain settings, cuz prompts+models can actually unlock lots of possibilities (gpts are nice actually
Mina Cheragh

@busmark_w_nika  @stainlu 

Yes. There are solutions out there but curious to know if you have used any of them.

Stain Lu
@mina_cheragh we have used all of them, like cherry studio, chatwise and lot more solutions targetting developers
Hyuntak Lee
Launching soon!

I use ChatGPT's O1 pro mode as a virtual consultant. I've tried and still run deep research in other models with the same task, but I'll say O1 pro mode is my taste.

Nika

@hyuntak_lee What do you think about the above-mentioned deepseek? Do you trust the service?

Hyuntak Lee
Launching soon!

@busmark_w_nika I don't want to judge the service, but I'll say understanding the way it encrypts and protects user information in technical view hugely deterred me from using the service

André J

Might be a fluke. But I asked Llama 4 the other day about it self. It didn't know anything about the existence of Llama 4 😅 . Some reports from friends in the AI business claim Llama 4 isn't living up to the hype. But I love the competition!

Nika

@sentry_co TBH, I haven't tried it yet, I think that by Meta I used their AI option within Facebook (or Instagram) last week and it failed in the Slovak language :D I was after 10pm, so I didn't bother to speak in English :D

André J

🧸 Here you go honey 🍯 alpaca.chat 👈 (test llama 4 for free)

Matt McDonagh
Launching soon!

Gemini Flash 2.0 has been amazing for us.


Massive context, high precision, doesn't speak robotically, uber cheap.

Nika

@matt__mcdonagh So do you use it more than ChatGPT? :)

Matt McDonagh
Launching soon!

@busmark_w_nika Correct, we don't use OAI at all.


Google nailed it.

Kirill Golubovskiy

I use ChatGPT because it fully handles the tasks I’m comfortable delegating to AI.

I’ve tried others, but it’s the only one I’m willing to pay for.

Nika

@kirill_golubovskiy We have it also like that. We pay only for ChatGPT + Visual Electric for images but I think with the new model that is improved and creates better images, we will be the team ChatGPT (all in).

Isha Nasir

DeepSeek is good, BTW. Is it true that DeepSeek data originates in China?

When ChatGPT was launched, it got the hype and users within a short time, but I didn't think any other AI tool would get that kind of popularity.

Ursus Viola

ChatGPT have an advantage of being the first conversational AI that was easily available to use for people and OpenAI marketing and development efforts are also the reason for it having such a large share of market as GPT keeps updating and keeps up with trends faster than the others. If we talk about trust, then I wouldn't recommend trusting any of them as all of them are training through our data and each of our prompts are helping them improve. Dunno if that's a good thing or bad.

Nika

@ursus_viola Totally agree tho ChatGPT already knows me better than myself :D

Elissa Craig

I have a preference towards ChatGPT. In my case, it 100% had that first mover advantage.

I have been using Gemini more and more but that is because the Google Suite is building it into their tools. I currently pay for Grammarly PRO and have been considering churning since the Gemini tool is becoming more accessible, is free, and the output is just as good. I would expect adding Gemini into the Google products, like Google Docs, will increase usage - although, unlikely to get the same direct volume as other LLMs imo.

Nika

@elissa_craig IMO, it was a great advantage for OpenAI to be the first so people use it in many cases only because of this fact (historical background) and they are more likely to pay rather for GPT than any other model.

Islam Akramov

Stepping back from the original question, I think one of the most fascinating parts of this whole space is that many of these AI projects don't fully understand the potential or limits of their own models.

That’s because a significant number of them are self-learning or highly adaptive systems, which means that even the creators are constantly researching and exploring the behaviors of their own models.

What I’m getting at is this: while it's popular to compare LLMs based on performance, benchmarks, or niche use cases, these comparisons often rely on incomplete or evolving research.
So drawing strong conclusions from such research can be misleading — especially when the models themselves are still changing and being discovered in real time.

This doesn't mean we shouldn't compare or evaluate. It just means we should take all those conclusions with a grain of salt — and remain open to the idea that what we think a model is good at today might not hold tomorrow.

Jhon Smeth

Great breakdown, Nika! It’s interesting to see how the landscape is evolving with each new release. Personally, I lean toward OpenAI's ChatGPT (especially GPT-4) for its balance of reasoning, creativity, and safety. It’s been refined over time and seems to offer the most consistent performance across a wide range of tasks. That said, Llama 4 is definitely one to watch — Meta’s new lineup sounds ambitious, especially with names like “Behemoth” 👀

Grok has potential, but I share your hesitation about potential censorship. Same goes for DeepSeek — the performance looks promising, but the data origin questions still linger.

I think it’s similar to how people pick AI image enhancers: some go for the most polished (like Remini or Let’s Enhance), while others explore open-source options that give them more control. Ultimately, it’s all about trust, transparency, and how much creative freedom you want from the tool.

Which model are you finding most useful lately?

Reid Kimball
I use Claude and Gemini the most. More and more I am using Gemini 2.5 pro. Excellent coding knowledge and follows my instructions very well. Claude 3.7 is terrible at following instructions.