Alex Gap

Windsurf users: which models are you using?

by•

There's a lot of options for models these days. I've been using Claude 3.7 but I'm curious what's been working well for others. What model are you using and why?


Is the thinking version of @Claude by Anthropic worth the extra credit spend? Does @DeepSeek work well enough to save some credits? Is it worth trying any of the @ChatGPT by OpenAI models?

Add a comment

Replies

Best
Michael Adegoke

Claude 3.7 appears to be more intelligent while using WIndsurf, it's able to understand the project better and correct itself if it makes an error. Before its release, I used Claude 3.5, which is still much better than deepseek and even chatgpt. My use case was scraping with python, doing some analysis with Pandas. I also used it for som NLP, none of the other models worked as well as Claude 3.7 did after a couple of tests.

Sergei Zotov

Privacy is the first priority to me. So this is why it's not Windsurf, and the model I use is Qwen2.5-Coder-32B-Instruct

Alex Gap

@zot Do you run qwen on your own machine? Is it pretty resource intensive?

I have a home server that's not doing too much right now so maybe it's something I should consider 🤔

Sergei Zotov

@lagap I actually run it on my MacBook Pro with M4 Max (64Gb) with VS Code + Continue.dev + Ollama


It's not as fast, as the OpenAI or Anthropic models, of course. But I have 2 choices for that:

  1. Either I use a smaller model (the output quality will decrease)

  2. Or instead of just plain vibe coding, outsource to it more simpler tasks that take time, while working on system design and algorithms. E.g. "I need to have an FAQ template, here's the structure with 1 accordion item, populate those accordion items with this list of questions and answers:". It takes some time to be done manually, and it's quite simple, so I let an LLM do that for me :)

I also use it to educate myself. For example, I know a bit of Python/Django, but I don't know how to create MJML templates for the emails it sends - I task my LLM to do this for me, and through that I learn it myself (and also achieve like 90% result right out of the gate).


As for the home server, unless it has 24GB GPU, I wouldn't recommend it

Rohan Gayen

@zot How is the code quality?

Sergei Zotov

@admiralrohan pretty good! You can also take a look at the benchmarks: https://www.prollm.ai/leaderboard


I believe out of open-source models that don't have more parameters (and GB needed to run it), it's the best at the moment, with deepseek-coder-v2 being another great choice with fewer parameters


I currently use it with Python (both with Data Science stack of Pandas, Numpy, Scikit learn, XGBoost, etc.) + Web Development (Django with HTML/TailwindCSS)


Of course, Claude and O1 are better in terms of quality, but since I need to have my code private, I'm not complaining

Plamen Ivanov

I've been using all three, OpenAI, Claude, and DeepSeek, primarily for coding. Claude 3.7 (and even 3.5) has been impressive for my needs, delivering consistent quality. DeepSeek-v3 is also performing really well and feels like a great cost-effective option.

That said, I’ve been a bit disappointed with ChatGPT’s performance lately, which pushed me to explore alternatives like Qwen models—surprisingly solid for many tasks! Curious if others have had similar experiences or found hidden gems.

André J

Yepp. Using Cloude Sonnet 3.7 a lot right now as well, for simple code tasks in cursor. It's very cheap and feels as good as O1 mini. I use O1 regular for more advance coding tasks that needs more thinking power. Also wanted to try Google gemini 2.5 pro. But cursor has not added it yet to their model list 😾

Alex Gap

@sentry_co What kind of complex coding tasks do you switch to 01 mini for?

André J

Not O1 mini. O1 Regular (~$100 a day). It's hellish expensive. But packs more punch for more complex higher level tasks. I wrote a bit about when I use deluxe models vs when I use cheaper models here: https://eoncodes.substack.com/p/how-i-built-an-ai-wrapper-saas-in TL;DR: You get more on point, more well researched solutions for abstract problems with more than one answer by using the expensive and slower models like O1. Perfect for fast hackathons where you need to cover a lot of ground, with a lot of brain power. Less so for day to day coding on bigger projects, where you iterate slower etc. Then I mostly use PPLX or no LLM at all. As its usually more about evolution than revolution. Making sure tests work, tweaking design. Fixing smaller things I spot etc. But if I need to add a new feature and I might drop into hackthon mode for a bit and use sonnet 3.7, or O1 regular if sonnet can't come up with the goods. I use BYO API keys in cursor for OpenAI, Claude and Google gemini to get unlimited access to the best models.

Alex Gap

@sentry_co That makes sense and kinda jives with my understanding. I tend to do more iterative work on bigger projects so O1 is probably not going to be a huge game changer for me.

I also haven't considered BYO API keys. It'd be nice to have access to latest models but I was afraid of wracking up a huge bill. Although I've never really priced out the cost difference between using my own API keys vs paying for windsurf credits.

André J

Using O1 Regular on mature project also has its merits. I do refactor sweeps sometimes. Just go through a lot of code, and ask O1 if anything can be improved. I think of O1 regular as a senior dev I have access to. And O1 mini etc. More like interns that can find me some ideas of doing something on stackoverflow.


As for cost. Even using $500-1000 a month on BYO keys. It's still just a fraction of Dev salary in the west anyways. And time is money, so you save probably 10-20x of what you spend. Also tech is exponential, the better the code, the more exponentially valuable what you build becomes. I think the math here is in favour of spending here. Its like you dont use a macbook air from 2011 when there is mac studio with 32x cpus. that can build your proejcts in seconds vs hours for the macbook air.


Its a bit perplexing to me why there isnt pay as you go options for cursor and windsurf etc. Or more premium subscriptions for that matter. Most business would easily pay $500+ to supercharge their dev teams and save a lot of time and headcount.

Samuel Hart

Claude 3.7 can be painful with the amount of extra spend, but also super powerful. We power hipaa compliant ai tools w/ hathr.ai and we're looking for when 3.7 is approved for regulated work.

Aniket J

I stick to Claude Sonnet 3.7 and 3.7 thinking (for larger issues where I need to brainstorm). I think the model selection also changes codebase to codebase. Personally, I have been using windsurf since I first published my MVP. So all the bigger features, improvements were done using Windsurf + Claude 3.5 and now 3.7. So it handles my codebase really really well.