Do you trust any particular LLM model? (Your preferences over other AI solutions)
Yesterday, Meta announced that they have released a new collection of AI models, Llama 4, in its Llama family.
(It consists of Llama 4 Scout, Llama 4 Maverick, and Llama 4 Behemoth.)
Historically, Open AI with its ChatGPT has been on the market for the longest period.
It has the biggest share (77.66%) and people are somehow used to it.
DeepSeek also experienced a huge boom but there were concerns about data because of China's origins.
Some people are inclined to Grok tho they are also beginning to doubt whether Musk will start applying some form of censorship and proofreading of information.
Gemini, Perplexity & Claude are also frequently cited models, but I don't feel they are used that much.
Which LLM model do you use, trust and why have you decided on that particular LLM?
Replies
The only LLM I truly trust is the one that works solely on my computer without an Internet connection.
Currently, my stack is:
General: Microsoft's phi4
Coding: Qwen2.5-Coder:32B
Pure instruction-following: Qwen2.5:32B
Reasoning (I don't use it much, but...): Deepseek R1:7B
minimalist phone: creating folders
@zot Yeah, I have heard that more people rather run them locally. Understandable.
App Finder
@zot there may be an alternative to that now: @Privatemode AI , I think they'll have a Llama 4 soon
@konrad_sx judging by what they tell on the website, I'm very doubtful about this technology. There are ways to do this correctly though. I bet they'll pivot to that in the future
App Finder
@zot how could it be done better? The only possibility I see would be fully homomorphic encryption, and that's much too slow currently.
@konrad_sx I believe in on device/self-hosted data obfuscation/masking.
For example, a person sells their property to a company, and the overall contract is kinda basic. But we just don't want to send personal data to a 3rd party app:
We need to define what data we should mask. Let's say it's this:
person: Sergei Zotov
address: 9351958120 Maple Avenue, Apt 12B, Chicago, IL 60614, USA
company: Apple Inc.
We can replace all of this on device with something like this:
person: John Doe
address: 456 Elm Street, Suite 3, Los Angeles, CA 90001, USA
company: Monsters, Inc.
Send the obfuscated document into any API we want
Convert the obfuscated data in the processed document to the previous values on device
We can also do it with numbers (e.g. apply multiplier/divider).
Btw, this obfuscation can be done even with basic text models, so it won't take much of the resources on device / in a self-hosted solution
@zot When computers get 1TB ram, and 64 core cpus and h100 graphics card. I will join you, I do think the cloud models will always outpace local maschines tho 😅
@sentry_co yeah, to get the performance of GPT-4.5 or Claude 3.7 you really need all of that 😅
But I think we're headed from a freakishly-large-model-that-knows-everything to smaller models for certain use cases.
For example, Qwen2.5-Coder:32B had almost similar performance to 4o in coding when it was first introduced. And it only takes ~20 GB GPU memory to run it. Any MacBook Pro with Apple Silicon and 32GB RAM can run it pretty well, and if not, you can always use the model with fewer parameters (7B and 14B also are good)
The same way I feel about phi4 - yeah, it can't work well across multiple languages, but it's a pretty good model for general use cases in English. Etc. etc.
In the future, I believe we're going to have some sort of router system that will just pull the best model possible for your query. It will preserve your GPU RAM and will get you similar results to proprietary models. Actually, that's a cool product idea...
Cursor has a router built in now. It uses the appropriate model for the task at hand.
I think local, will be very interesting soon. Esp since OpenAI is not opening their latest image model for api access. Probably because it will cause havoc in many industries, because its so good. But local models cant be far behind, so that means the havoc will come anyways. Its just that companies cant facilitate for it, so people will have to instlal these things them selfs. Which opens up huge opportunities to create the cursor for local, or the local AI photoshop etc.
And then there is MCP. And agentic flows. All thrive locally, and requires basic hardware.
Good times ahead!
Grimo
for most of the cases we use claude 3.5/3.7 sonnet which is stable, fast and smart. tbh in creative writing, 3.5 and 3.7 seem not much difference.
we are also playing around with gemini 2.5 pro which is sometimes significantly better than claude models in longer writing, but the problem is that it is also significantly slower
for even harder tasks that require reasoning, our default choice goes to deepseek r1 instead of openai o series, because the later one only excels in benchmarks, but the answer just seem so ‘over-aligned‘ and often only gives us long but useless bullshit. r1 feels much smarter actually, i think the reason is in the data side.
lastly, please allow me to advertise Grimo (Cursor for Writing) , the only place where you can easily switch among all those latest models for FREE, even reasoning models like deepseek r1 and openai o3-mini, all in a pretty nice editor where you literally can quote any part of the doc (chatgpt canvas alternative maybe).
Grimo
and btw llama4 is a fraud, performance even worse than llama3
@stainlu
Oh really? Was about to test it. What scenarios you've used to test Llama4's performance?
Grimo
@mina_cheragh structured output, command understanding, and creative writing
@stainlu
It's a pretty solid test.
Any AI models you prefer for executing these tasks?
minimalist phone: creating folders
@stainlu I like the idea behind Grimo. Yesterday or the day before I saw a post from @mina_cheragh and she also mentioned the option to store LLMs in one place.
Grimo
@busmark_w_nika @stainlu
Yes. There are solutions out there but curious to know if you have used any of them.
Grimo
I use ChatGPT's O1 pro mode as a virtual consultant. I've tried and still run deep research in other models with the same task, but I'll say O1 pro mode is my taste.
minimalist phone: creating folders
@hyuntak_lee What do you think about the above-mentioned deepseek? Do you trust the service?
@busmark_w_nika I don't want to judge the service, but I'll say understanding the way it encrypts and protects user information in technical view hugely deterred me from using the service
Might be a fluke. But I asked Llama 4 the other day about it self. It didn't know anything about the existence of Llama 4 😅 . Some reports from friends in the AI business claim Llama 4 isn't living up to the hype. But I love the competition!
minimalist phone: creating folders
@sentry_co TBH, I haven't tried it yet, I think that by Meta I used their AI option within Facebook (or Instagram) last week and it failed in the Slovak language :D I was after 10pm, so I didn't bother to speak in English :D
🧸 Here you go honey 🍯 alpaca.chat 👈 (test llama 4 for free)
Gemini Flash 2.0 has been amazing for us.
Massive context, high precision, doesn't speak robotically, uber cheap.
minimalist phone: creating folders
@matt__mcdonagh So do you use it more than ChatGPT? :)
@busmark_w_nika Correct, we don't use OAI at all.
Google nailed it.
I use ChatGPT because it fully handles the tasks I’m comfortable delegating to AI.
I’ve tried others, but it’s the only one I’m willing to pay for.
minimalist phone: creating folders
@kirill_golubovskiy We have it also like that. We pay only for ChatGPT + Visual Electric for images but I think with the new model that is improved and creates better images, we will be the team ChatGPT (all in).
DeepSeek is good, BTW. Is it true that DeepSeek data originates in China?
When ChatGPT was launched, it got the hype and users within a short time, but I didn't think any other AI tool would get that kind of popularity.
minimalist phone: creating folders
@isha_nasir AFAIK, the founder of DeepSeek even met the Chinese President Xi Jinping: https://www.reddit.com/r/singularity/comments/1iri7md/liang_wenfeng_deepseek_meets_xi_jinping/
ChatGPT have an advantage of being the first conversational AI that was easily available to use for people and OpenAI marketing and development efforts are also the reason for it having such a large share of market as GPT keeps updating and keeps up with trends faster than the others. If we talk about trust, then I wouldn't recommend trusting any of them as all of them are training through our data and each of our prompts are helping them improve. Dunno if that's a good thing or bad.
minimalist phone: creating folders
@ursus_viola Totally agree tho ChatGPT already knows me better than myself :D
Headliner
I have a preference towards ChatGPT. In my case, it 100% had that first mover advantage.
I have been using Gemini more and more but that is because the Google Suite is building it into their tools. I currently pay for Grammarly PRO and have been considering churning since the Gemini tool is becoming more accessible, is free, and the output is just as good. I would expect adding Gemini into the Google products, like Google Docs, will increase usage - although, unlikely to get the same direct volume as other LLMs imo.
minimalist phone: creating folders
@elissa_craig IMO, it was a great advantage for OpenAI to be the first so people use it in many cases only because of this fact (historical background) and they are more likely to pay rather for GPT than any other model.
Stepping back from the original question, I think one of the most fascinating parts of this whole space is that many of these AI projects don't fully understand the potential or limits of their own models.
That’s because a significant number of them are self-learning or highly adaptive systems, which means that even the creators are constantly researching and exploring the behaviors of their own models.
What I’m getting at is this: while it's popular to compare LLMs based on performance, benchmarks, or niche use cases, these comparisons often rely on incomplete or evolving research.
So drawing strong conclusions from such research can be misleading — especially when the models themselves are still changing and being discovered in real time.
This doesn't mean we shouldn't compare or evaluate. It just means we should take all those conclusions with a grain of salt — and remain open to the idea that what we think a model is good at today might not hold tomorrow.
Great breakdown, Nika! It’s interesting to see how the landscape is evolving with each new release. Personally, I lean toward OpenAI's ChatGPT (especially GPT-4) for its balance of reasoning, creativity, and safety. It’s been refined over time and seems to offer the most consistent performance across a wide range of tasks. That said, Llama 4 is definitely one to watch — Meta’s new lineup sounds ambitious, especially with names like “Behemoth” 👀
Grok has potential, but I share your hesitation about potential censorship. Same goes for DeepSeek — the performance looks promising, but the data origin questions still linger.
I think it’s similar to how people pick AI image enhancers: some go for the most polished (like Remini or Let’s Enhance), while others explore open-source options that give them more control. Ultimately, it’s all about trust, transparency, and how much creative freedom you want from the tool.
Which model are you finding most useful lately?