Do you trust any particular LLM model? (Your preferences over other AI solutions)

Nika

Ambassador

Yesterday, Meta announced that they have released a new collection of AI models, Llama 4, in its Llama family.

(It consists of Llama 4 Scout, Llama 4 Maverick, and Llama 4 Behemoth.)

Historically, Open AI with its ChatGPT has been on the market for the longest period.

It has the biggest share (77.66%) and people are somehow used to it.

DeepSeek also experienced a huge boom but there were concerns about data because of China's origins.

Some people are inclined to Grok tho they are also beginning to doubt whether Musk will start applying some form of censorship and proofreading of information.

Gemini, Perplexity & Claude are also frequently cited models, but I don't feel they are used that much.

Which LLM model do you use, trust and why have you decided on that particular LLM?

2.8K views

Replies

Best

Sergei Zotov

The only LLM I truly trust is the one that works solely on my computer without an Internet connection.

Currently, my stack is:

General: Microsoft's phi4
Coding: Qwen2.5-Coder:32B
Pure instruction-following: Qwen2.5:32B
Reasoning (I don't use it much, but...): Deepseek R1:7B

Report

4mo ago

Nika

Ambassador

@zot Yeah, I have heard that more people rather run them locally. Understandable.

Report

4mo ago

Konrad S.

App Finder

@zot there may be an alternative to that now: @Privatemode AI , I think they'll have a Llama 4 soon

Report

4mo ago

Sergei Zotov

@konrad_sx judging by what they tell on the website, I'm very doubtful about this technology. There are ways to do this correctly though. I bet they'll pivot to that in the future

Report

4mo ago

Konrad S.

App Finder

@zot how could it be done better? The only possibility I see would be fully homomorphic encryption, and that's much too slow currently.

Report

4mo ago

Sergei Zotov

@konrad_sx I believe in on device/self-hosted data obfuscation/masking.

For example, a person sells their property to a company, and the overall contract is kinda basic. But we just don't want to send personal data to a 3rd party app:

We need to define what data we should mask. Let's say it's this:

person: Sergei Zotov

address: 9351958120 Maple Avenue, Apt 12B, Chicago, IL 60614, USA

company: Apple Inc.

We can replace all of this on device with something like this:

person: John Doe

address: 456 Elm Street, Suite 3, Los Angeles, CA 90001, USA

company: Monsters, Inc.

Send the obfuscated document into any API we want
Convert the obfuscated data in the processed document to the previous values on device

We can also do it with numbers (e.g. apply multiplier/divider).

Btw, this obfuscation can be done even with basic text models, so it won't take much of the resources on device / in a self-hosted solution

Report

4mo ago

André J

@zot When computers get 1TB ram, and 64 core cpus and h100 graphics card. I will join you, I do think the cloud models will always outpace local maschines tho 😅

Report

4mo ago

Sergei Zotov

@sentry_co yeah, to get the performance of GPT-4.5 or Claude 3.7 you really need all of that 😅

But I think we're headed from a freakishly-large-model-that-knows-everything to smaller models for certain use cases.

For example, Qwen2.5-Coder:32B had almost similar performance to 4o in coding when it was first introduced. And it only takes ~20 GB GPU memory to run it. Any MacBook Pro with Apple Silicon and 32GB RAM can run it pretty well, and if not, you can always use the model with fewer parameters (7B and 14B also are good)

The same way I feel about phi4 - yeah, it can't work well across multiple languages, but it's a pretty good model for general use cases in English. Etc. etc.

In the future, I believe we're going to have some sort of router system that will just pull the best model possible for your query. It will preserve your GPU RAM and will get you similar results to proprietary models. Actually, that's a cool product idea...

Report

4mo ago

André J

Cursor has a router built in now. It uses the appropriate model for the task at hand.

I think local, will be very interesting soon. Esp since OpenAI is not opening their latest image model for api access. Probably because it will cause havoc in many industries, because its so good. But local models cant be far behind, so that means the havoc will come anyways. Its just that companies cant facilitate for it, so people will have to instlal these things them selfs. Which opens up huge opportunities to create the cursor for local, or the local AI photoshop etc.

And then there is MCP. And agentic flows. All thrive locally, and requires basic hardware.

Good times ahead!

Report

4mo ago

Stain Lu

Grimo

for most of the cases we use claude 3.5/3.7 sonnet which is stable, fast and smart. tbh in creative writing, 3.5 and 3.7 seem not much difference.
we are also playing around with gemini 2.5 pro which is sometimes significantly better than claude models in longer writing, but the problem is that it is also significantly slower
for even harder tasks that require reasoning, our default choice goes to deepseek r1 instead of openai o series, because the later one only excels in benchmarks, but the answer just seem so ‘over-aligned‘ and often only gives us long but useless bullshit. r1 feels much smarter actually, i think the reason is in the data side.

lastly, please allow me to advertise Grimo (Cursor for Writing) , the only place where you can easily switch among all those latest models for FREE, even reasoning models like deepseek r1 and openai o3-mini, all in a pretty nice editor where you literally can quote any part of the doc (chatgpt canvas alternative maybe).

Report

4mo ago

Stain Lu

Grimo

and btw llama4 is a fraud, performance even worse than llama3

Report

4mo ago

Mina Cheragh

Nily AI

@stainlu

Oh really? Was about to test it. What scenarios you've used to test Llama4's performance?

Report

4mo ago

Stain Lu

Grimo

@mina_cheragh structured output, command understanding, and creative writing

Report

4mo ago

Mina Cheragh

Nily AI

@stainlu

It's a pretty solid test.

Any AI models you prefer for executing these tasks?

Report

4mo ago

Nika

Ambassador

@stainlu I like the idea behind Grimo. Yesterday or the day before I saw a post from @mina_cheragh and she also mentioned the option to store LLMs in one place.

Report

4mo ago

Stain Lu

Grimo

@busmark_w_nika glad you like it! actually we consider models as just a parameter of certain settings, cuz prompts+models can actually unlock lots of possibilities (gpts are nice actually

Report

4mo ago

Mina Cheragh

Nily AI

@busmark_w_nika @stainlu

Yes. There are solutions out there but curious to know if you have used any of them.

Report

4mo ago

Stain Lu

Grimo

@mina_cheragh we have used all of them, like cherry studio, chatwise and lot more solutions targetting developers

Report

4mo ago

Hyuntak Lee

I use ChatGPT's O1 pro mode as a virtual consultant. I've tried and still run deep research in other models with the same task, but I'll say O1 pro mode is my taste.

Report

4mo ago

Nika

Ambassador

@hyuntak_lee What do you think about the above-mentioned deepseek? Do you trust the service?

Report

4mo ago

Hyuntak Lee

@busmark_w_nika I don't want to judge the service, but I'll say understanding the way it encrypts and protects user information in technical view hugely deterred me from using the service

Report

4mo ago

André J

Might be a fluke. But I asked Llama 4 the other day about it self. It didn't know anything about the existence of Llama 4 😅 . Some reports from friends in the AI business claim Llama 4 isn't living up to the hype. But I love the competition!

Report

4mo ago

Nika

Ambassador

@sentry_co TBH, I haven't tried it yet, I think that by Meta I used their AI option within Facebook (or Instagram) last week and it failed in the Slovak language :D I was after 10pm, so I didn't bother to speak in English :D

Report

4mo ago

André J

🧸 Here you go honey 🍯 alpaca.chat 👈 (test llama 4 for free)

Report

4mo ago

Matt McDonagh

Gemini Flash 2.0 has been amazing for us.

Massive context, high precision, doesn't speak robotically, uber cheap.

Report

4mo ago

Nika

Ambassador

@matt__mcdonagh So do you use it more than ChatGPT? :)

Report

4mo ago

Matt McDonagh

@busmark_w_nika Correct, we don't use OAI at all.

Google nailed it.

Report

4mo ago

Kirill Golubovskiy

I use ChatGPT because it fully handles the tasks I’m comfortable delegating to AI.

I’ve tried others, but it’s the only one I’m willing to pay for.

Report

4mo ago

Nika

Ambassador

@kirill_golubovskiy We have it also like that. We pay only for ChatGPT + Visual Electric for images but I think with the new model that is improved and creates better images, we will be the team ChatGPT (all in).

Report

4mo ago

Reid Kimball

I use Claude and Gemini the most. More and more I am using Gemini 2.5 pro. Excellent coding knowledge and follows my instructions very well. Claude 3.7 is terrible at following instructions.

Report

4mo ago

Nika

Ambassador

@reid_kimball I have heard that some models are unreliable. When you talk about coding – what Copilot? Good/Bad?

Report

4mo ago

Reid Kimball

@busmark_w_nika I use Cody from Sourcegraph inside VS Code. It’s an extension. I use Gemini 2.5 Pro the most right now and it’s sooo good. My app meadow mentor would have taken me a year or more to get to where I am now if I didn’t use AI. It’s taken me 4 months. I think the key is knowing how to prompt the AI to give it the context it needs.

Report

4mo ago

Nika

Ambassador

@reid_kimball So with Copilot no experience, right?

Report

4mo ago

Reid Kimball

@busmark_w_nika that’s right, I haven’t used Copilot yet.

Report

4mo ago

Isha Nasir

DeepSeek is good, BTW. Is it true that DeepSeek data originates in China?

When ChatGPT was launched, it got the hype and users within a short time, but I didn't think any other AI tool would get that kind of popularity.

Report

4mo ago

Nika

Ambassador

@isha_nasir AFAIK, the founder of DeepSeek even met the Chinese President Xi Jinping: https://www.reddit.com/r/singularity/comments/1iri7md/liang_wenfeng_deepseek_meets_xi_jinping/

Report

4mo ago

Ursus Viola

ChatGPT have an advantage of being the first conversational AI that was easily available to use for people and OpenAI marketing and development efforts are also the reason for it having such a large share of market as GPT keeps updating and keeps up with trends faster than the others. If we talk about trust, then I wouldn't recommend trusting any of them as all of them are training through our data and each of our prompts are helping them improve. Dunno if that's a good thing or bad.

Report

4mo ago

Nika

Ambassador

@ursus_viola Totally agree tho ChatGPT already knows me better than myself :D

Report

4mo ago

Elissa Craig

Headliner

I have a preference towards ChatGPT. In my case, it 100% had that first mover advantage.

I have been using Gemini more and more but that is because the Google Suite is building it into their tools. I currently pay for Grammarly PRO and have been considering churning since the Gemini tool is becoming more accessible, is free, and the output is just as good. I would expect adding Gemini into the Google products, like Google Docs, will increase usage - although, unlikely to get the same direct volume as other LLMs imo.

Report

4mo ago

Nika

Ambassador

@elissa_craig IMO, it was a great advantage for OpenAI to be the first so people use it in many cases only because of this fact (historical background) and they are more likely to pay rather for GPT than any other model.

Report

4mo ago

Islam Akramov

Stepping back from the original question, I think one of the most fascinating parts of this whole space is that many of these AI projects don't fully understand the potential or limits of their own models.

That’s because a significant number of them are self-learning or highly adaptive systems, which means that even the creators are constantly researching and exploring the behaviors of their own models.

What I’m getting at is this: while it's popular to compare LLMs based on performance, benchmarks, or niche use cases, these comparisons often rely on incomplete or evolving research.
So drawing strong conclusions from such research can be misleading — especially when the models themselves are still changing and being discovered in real time.

This doesn't mean we shouldn't compare or evaluate. It just means we should take all those conclusions with a grain of salt — and remain open to the idea that what we think a model is good at today might not hold tomorrow.

Report

4mo ago

Nika

Ambassador

@islam_akramov so I ask differently: Which model do you use the most or did you use for polishing this message? :D

Report

4mo ago

Jhon Smeth

Great breakdown, Nika! It’s interesting to see how the landscape is evolving with each new release. Personally, I lean toward OpenAI's ChatGPT (especially GPT-4) for its balance of reasoning, creativity, and safety. It’s been refined over time and seems to offer the most consistent performance across a wide range of tasks. That said, Llama 4 is definitely one to watch — Meta’s new lineup sounds ambitious, especially with names like “Behemoth” 👀

Grok has potential, but I share your hesitation about potential censorship. Same goes for DeepSeek — the performance looks promising, but the data origin questions still linger.

I think it’s similar to how people pick AI image enhancers: some go for the most polished (like Remini or Let’s Enhance), while others explore open-source options that give them more control. Ultimately, it’s all about trust, transparency, and how much creative freedom you want from the tool.

Which model are you finding most useful lately?

Report

4mo ago

Nika

Ambassador

@jhon_smeth Only ChatGPT, for now :)

Report

4mo ago