Hacker News new | past | comments | ask | show | jobs | submit login
Code Smarter, Not Harder: Developing with Cursor and Claude Sonnet (jstoppa.com)
58 points by jstoppa 34 days ago | hide | past | favorite | 74 comments



Something that's been amusing me about Cursor is that I feel a lot of the excitement about it isn't so much about Cursor, it's people realizing that LLMs have got REALLY good at writing code now. GPT-4 was good for the past year, but the latest models (especially Claude 3.5 Sonnet) are spectacular.

Those of us who've been copying and pasting LLM-generated code back and forth from ChatGPT and Claude.ai for the past year had figured this out already, so Cursor wasn't a huge surprise for us.

For a lot of people Cursor is the first time they've really appreciated how good this stuff has got - so it's getting a massive amount of buzz, much of which should really be credited to Claude 3.5 Sonnet.

(Cursor have done an excellent job designing editor features on top of that model and deserve credit for that, I'm just a bit tickled at some of the excitement which basically boils down to "Huh, LLMs can write good code now".)


I don’t share the sentiment that it’s “very good”. It’s strongly mediocre at basic tasks and regularly fails at anything complex.


Have we been using the same models?

Claude 3.5 Sonnet writes me excellent code in a variety of different languages. I have a bunch of examples here: https://simonwillison.net/tags/claude-3-5-sonnet/


For Python it's bliss. For non-std C++ it's hit-and-miss. For CUDA C++ it's terrible.


For POSIX sh it generates kind of ugly and unnecessarily complex scripts.


"POSIX sh it" hehehe - my inner 5 year old is too pleased right now.


Yes we have, but probably for different problems.


I'm getting what I consider very good results. Yes, I have to iterate it and talk to it, spec very clearly, but if you know pretty precisely what you want I've found it very reliable.

I usually try to actually provide AWS resource IDs / VPC id's, bucket names and everything and at least with AWS / Python stuff it is good (yes, some corners still but the trend is very good).

My prompts aren't short but it spell out very clearly what I want. To process 2 million XML files let's say stored in S3, using an SQS queue ZZZ that will pass in S3 object ID's in this format, store results back to RDS Postgresql here. Use Loguru or whatever for logging. I provide sample XML files, sample results, schema for database (DDL) etc etc. Then you can iterate on this - deploy using docker to ECS, use fargate, then do some scale out when queue depth is deep, go to asyncio or multi-process or multi-threading etc etc and just build. It goes quick.


That doesn’t sound like an inherently complex programming task - just simple mindless munging


That's a failure of imagination. If I can architect and describe my interfaces and have Claude write the current component, then I can repeat with every other component and write minimal code / corrections.

At the same time, I've minimized complexity in my codebase. It's a win-win, and honestly probably the next paradigm of software engineering.


Depends on the problem at hand. Some code has high essential complexity. Claude doesn’t do well with math heavy work in my experience


I must be living in a different universe, every single LLM I've tried are shit at the details/nuances of writing code.


I tried getting Claude to help me model an event-based store with moderate type safety and it was a total disaster. It was randomly trying to convince me to turn it into a state machine using a library in Python (I was working in TypeScript).

This is so far from awesome as far as productivity goes. I do find cursor suggestions useful here and there, but it’s typically filling in the simpler details of an implementation I had to design without its help. Ultimately it doesn’t save nearly as much time as I hoped it might a couple years ago.

Still, cool development in terms of dev tools and I’ll keep experimenting.


>>I tried getting Claude to help me model an event-based store with moderate type safety and it was a total disaster.

That's just not how you work with a LLM. You don't give LLMs problems to solve. You do the thinking part and solve problems, and ask the LLM to implement small blocks of code, like the smallest possible, and you incrementally go from there.

This really is some what like asking the whole question in the Google search bar, not seeing any result and saying Google search doesn't work.

>>it’s typically filling in the simpler details of an implementation I had to design without its help.

Most complex solutions are just reusing and connecting simpler solutions.


This is how I was using it. I was trying to work through each part of the model, but kept running into an issue in which it would recommend implementing things with Python instead. It wasn’t impossible or even inadvisable to do it with typescript, so I have no idea where the impetus to switch to Python came from.

The issue occurred specifically when I was pointing out that type inference wouldn’t work as expected for typing the store and its corresponding events. I’d feed it some types that were close to what we needed and it would give up and show me how to do it with Python. I gave up after (I wish I was exaggerating) around 2 hours of experimenting with convincing it to believe it was possible to build and type the store correctly.

The store is built and working properly at this point so who knows, maybe the meta layer of typing things really throws LLMs.


> ...and ask the LLM to implement small blocks of code, like the smallest possible, and you incrementally go from there.

So it's autocomplete++? Talk about an insane level of hype for an incremental improvement to what we already have...if that's actually what we can hope to expect from these LLMs.


At their heart LLMs are basically a mechanism to guess(predict based on probability) the next word based on what it has already done/seen so far. How it goes about guessing them, or how it gets the context is based on attention and multihead attention functions. You could say that these functions provide which part(word) of a question/sentence must get a higher weight. That basically provides context based on which it predicts what needs to come next. How it predicts is basically your plain old neural network. Just like how in linear regression you know, if your model is predicting a straight line, you know in which area the next points are likely to appear. Similar mechanisms are used here to guess what the next words are likely to be.

It is a extreme auto complete feature in the context of code for sure. Note, LLMs are not sentient. That means they can't be held responsible for making decisions. Even more so code decisions.

Now you can argue its nothing special. But its some what like arguing eclipse/intellij are not special when compared to vim/emacs. That is just splitting hairs. IDE's definitely do a lot of productive work compared to plain text editors.

The initial demo's on LLMs confuse new users a lot. The demos go on the lines of giving a sentence like 'Implement a todo list app' or something like that and LLM writes some code implementing it. That's a wrong way to work with LLMs. Don't outsource your thinking or give it whole blanket problem statement to solve it. Think about LLMs like tools that do a lot of quick text writing for you you given the smallest, non ambiguous, atomically implementable/rollback statement possible.

It gets a while to get used to this, but once you are, you are more productive.


LLMs...60 percent of the time it works every time.


Which language are you using, and what kind of code are you writing? So far I've had good results for Javascript/Typescript and SQL for web apps and Unreal C++ for game development. I can imagine it would do way worse though if you were writing something with a lot of domain-specific knowledge, really cutting-edge stuff or in a more obscure language like Fortran - but then again, I think that most developers would also struggle more in those conditions.

Part of what I find useful about LLMs is that it's almost like rubber duck debugging, except that the duck talks back. You have to explain to the LLM exactly what it is that you want it to do to get the right result, then it gives you a solution. Mostly the solution is right, but when it's wrong you'll better understand the problem through the act of writing it out.


> You have to explain to the LLM exactly what it is that you want it to do to get the right result, then it gives you a solution. Mostly the solution is right, but when it's wrong you'll better understand the problem through the act of writing it out.

The solution is either correct or incorrect. In general its level of incorrectness is not trivial. My language of choice is C#. Anytime I ask it to generate a series of classes (say a data model of a particular domain, using specific well-known patterns), it gets it wrong, quite wrong and it would take more effort to explain it to unfuck itself, by which time it would have forgotten 3 deliberate instructions from 5 interactions ago. It’s a brilliant moron.


>>The solution is either correct or incorrect.

Not even humans write 100% correct solutions. Everyday programming is not a theorem proving exercise. Programs are correct enough, to cover a agreeable amount of test cases.

>>Anytime I ask it to generate a series of classes

Generating a class is too big and broad of question to ask an LLM to generate. My chunks of inputs and outputs to/from an LLM are often on the lines of a single for loop, if block or a small chunk of code that can't be written in more than one way. There are several steps from here to a complete class.

LLM is your classic socratic thinking tool. You really to have to learn to ask and work in chunks of questions that are smallest, and involve only one change from the previous question, and are atomic enough to do a easy roll back. And you need to build from here.

As experience with the series of books The Little Schemer shows, this unfortunately is not everyone's cup of tea. A lot of people struggle to keep a train of thought in their minds and work in that workflow.

To that extent I see LLM are not for everyone. Just like how Google search was great and helped a lot of people, but the majority just couldn't bring themselves to sit in front of a screen and work for hours stitching solutions from various places on internet into a working solution.


Claude has no trouble with SQL queries, writing tests, utility functions, component sketches (React). But as soon as I start giving it reasonably complex stuff like interdependent generics (TypeScript) or intricate custom logic implementation on top of well-known frameworks, it quite often goes around in circles or outputs code with incorrect syntax.

You know, the stuff that I would actually feel as though I was receiving intellectual help with.

Current LLMs are _great_ at doing chores. They are not so great at producing cohesive, professional-grade code.


80% of folks are doing web stuff, as soon as you deviate from that the quality must go from 9/10 to 3/10 or whatever

Also have you tried shorter blocks ? "Write a function like this" instead of "write the app"


I would love to see someone on stream use AI really effectively.

I really try, but for the life of me, I can’t get AI to generate anything useful beyond simple tests.

They’re still autocomplete++ for me.


Agreed! But also...

Rather than your seeing someone stream "really effective 99th percentile AI coding", maybe it would be more illustrative for us all to see someone stream the "50th percentile developer unassisted" experience.

I suspect that's the thing we're all missing to help us understand the potential value of AI coding: how slow and ineffectively the median software developer works, bc they just quietly slog away and don't talk much about how they work :)

I humbly include myself on that thought btw


I also don’t understand the appeal. I never wrote boilerplate code in my life.

I recently spent the whole day debugging open source third parties on multiple OSes while using Wireshark to understand how the whole thing worked. Then I typed 5 lines for a fix in less than 30 seconds. How would an LLM help me here?


> I never wrote boilerplate code in my life.

You either have not written very much code, don’t know what boilerplate means, or are intentionally operating under an overly broad definition


You’re right. I don’t write a lot of code. It’s always very specific C++ stuff that does not need to be repeated a hundred times, or if it is it can be a virtual method declaration that I copy in a subclass. No need for a LLM for that.

I’ve only seen boilerplate used by web developers, but I’m not doing that.


You shouldn’t speak so confidently and condescendingly with so little experience.

Graphics programming, data science, anything involving handling user input and probably many more classes of software require ceremony and boilerplate to get going.

Vulkan famously takes 1000 lines of C++ boilerplate before you can draw a triangle.

Look into writing a X or Wayland compositor and you’ll see a mountain of boilerplate involved in handling input loops, displays, and communicating with the kernel.


Please don’t misquote me.

I want to see how someone who’s confident in using these tools write code with them.

I’m not as interested in the median developer’s coding practices.


Autocomplete++ is the killer feature for me, especially for more tedious thingsike SQL column naming.


A good stream example might be this on a podcast, its got a few cuts but generally can get the point and its by a complete beginner, 1 month in

https://www.youtube.com/watch?v=ehK-QqPstJ4


I find it effective to treat it like a junior, and provide detailed and specific tasks. Ie usual LLM usage.

A recent e.g. "Use the Javascript module pattern to create a module that stores audio chunks, recording state, and exposes functions to start and stop recording, and to store audio data"

It's a really simple task, but that prompt saved me 20 minutes of typing and futzing about. It also named the module sensibly, based on context from the rest of the code I guess. Then it prompted me on what to change in the calling code to use the new module (this wasn't 100% right, but still saved me a couple rounds of running the code and fixing problems).

It's taken me a while to change my habit of working out how to do what I want to do and then editing code, to rather just saying what I want to do, and being specific about it.


Even in cases like that, I find that it’s often wrong about whatever package I’m using, unless that package has had a stable long lived api.

I can maybe pull snippets out of AI generated code, but more often than not I just ask it for quick examples. Like I would do for stack overflow.


Cursor wow'ed me a year ago when I first used it. Been using it ever since. People are only starting to notice it now because of the virality machine. AI has been reasonably "good" at code for awhile. I think Karpathy finally talking positively about it a few weeks ago is what started off the current hype cycle.


How would you compare GPT-4o with Claude 3.5 Sonnet?


I've gave up on GPT-4o a while back, I've spent the last 1/2 hour (more like wasted it) trying to get Claude 3.5 Sonnet to do my bidding (nothing super complicated) and it's a complete mess. It does forget plenty. It gets things wrong. It's like the weather man, you just can't trust a word he says.


For my prompting style and the languages I work with - Python, JavaScript, SQL - I think Claude 3.5 Sonnet is slightly better.


Thanks! It's interesting how this perception seems to vary based on language.


Not parent but Claude any day of the week, it seems to keep context and do what I ask it better.


I've been using cursor a bit, I don't know if it's because what I'm doing is not frontend web development but the results have not been as insane as others claim. I really don't have any difficulty creating basic web layouts or forms so an AI assistant doing that for me is nice but it does not transform my work.


I haven't done anything with Cursor that I wouldn't be able to do without it so far, but for me it's made certain tasks _much_ faster.


I have, I wrote a GPU implementation of marching cubes algorithm in GLSL that shows realtime topology viewer of electron density grids by slicing through an electron voxel grid represented as a 3D texture on the GPU.

I dont know GLSL, (but I am very experienced with C/C++/Python), and I wrote this by using cursor and muddling through it, all in one afternoon.

Could I have learnt GLSL and coded it myself, maybe, maybe not. We are at the point now that I was able to produce something commercially useful to my platform in a language I didn't know, in a single afternoon, GLSL is pretty similar to C, but even still, theres a certain threshold of effort and time where "could I have done it with enough time" transforms into "I couldn't have done it", and we are close to that.


My issue with this line of argument is that people always want to compare "do it with Copilot" to "do it completely by scratch" when they should be comparing it to "do it by ignorantly copy-pasting from one of the many similar projects on GitHub then tweaking a few things." There are quite a few open-source GLSL implementations of marching squares, maybe copy-pasting would have been faster and higher-quality.


For me, categorically not. I had multiple 2 way dialogs about implementation details and highly specific questions during the implementation that we created together. It wasn't just saying "Implement X" and then "There's a bug in Y", it's things like "Explain this data structure and why you are allocating memory this way". Or "The algorithm is mirrored incorrectly on the Y axis, fix the coordinate system and give me an option to change it" --- The latter example took Cursor like 3 seconds to complete perfectly. I'm not exaggerating, it was 3 seconds. It had implemented the solution faster than I could have found the problem by scrolling the mouse wheel and reading the code with my eyes (Let alone THEN fixing it). Imagine a whole afternoon of this high momentum. Is it perfect? No. Is it net-positive? Yes, a LOT

Going away and finding implementations, and then trying to integrate them (they undoubtably use different data structures, functions etc) would have been MUCH slower, MUCH higher effort, and I would have given up much earlier. Having some{one,thing} there I can just ask a highly specific question and get an equally specific answer with examples RIGHT IN THE IDE, kept the momentum up.

There's absolutly no way finding other examples on github would have been faster or higher quality. This is no longer a matter of taste, its the practical difference between complete and incomplete.

I mean this went from "I dont know GLSL at all" to "Here is a complete implementation of a realtime electron density grid viewer running in WebGL in the browser" in an afternoon


> There's absolutely no way finding other examples on github would have been faster or higher quality.

How do we ascertain its quality? The problem is this absolute trust in your reply. How do you know what it tried to "explain" was the right explanation?

At least when you search and find examples you do evaluate potentially multiple solutions.


I absolutly dont trust it blindly at all. Where did you get that from?

I kept asking it questions and stepping through the debugger until I understood its implementation. How do I know it's implementation is correct? Because I can see the results, I can see the data sturctures in memory, I can step though it and understand it - I know what electron densities around atoms look like, and I kept iterating on the code after it made mistakes, helping it and fixing it together, until it was finished. I just kept asking questions and interating so I could learn what I needed to of GLSL so that I could get it "un-stuck" when it hit a dead end or got caught in a loop.

I dont expect it to come up with the correct implementation straight away, what I'm saying is the enormous productivity increase kept momentum and enthusiasm up so much, I was able to implement something new and novel, something that would otherwise not have been created.


> I absolutely don't trust it blindly at all. Where did you get that from?

Not consulting a 2nd source or just looking at the results, as in..

> Because I can see the results

Which, yes, it's correct in that sense but as per the other comments you can copy and example and get that same result. In development a lot of things are correct but have different implications, e.g. bubble sort vs quick sort.

> I'm saying is the enormous productivity increase

Assuming it has led you on the right / correct path. It's often times led me on to the wrong path instead.


> Not consulting a 2nd source or just looking at the results, as in..

I do continuously check multiple sources : reality, our material simulations and predictions are lab verified, and spectographic analysis shows our predictions are correct - I have large experimentally generated datasets that our predictions and code are verified against.

We even have a system called "reality server" who's job is experimental parity, it runs continuously checking predictions (which are all totally produced by our code - code we are writing with the help of Cursor) against experiments.

> Which, yes, it's correct in that sense but as per the other comments you can copy and example and get that same result. In development a lot of things are correct but have different implications, e.g. bubble sort vs quick sort.

All of us approximate to "good enough". This is good enough. Results, Big-O, Integration ease, good enough is multivariate, but good enough is good enough, I'm a startup, and I'm not searching for divine correctness, good enough on the multiple variables is good enough.

> Assuming it has led you on the right / correct path. It's often times led me on to the wrong path instead.

It led me down the wrong path many times, that just means you are not yet finished. Then with more work, we found the correct solution together.

Even in our materials simulations we fail 100 times and win once, the win still enormously outweighs the fails.

Nobody is claiming it's perfect, nobody is claiming it doesn't get stuff wrong, nobody is claiming it doesn't lead you down the wrong path. It's about keeping experimental momentum up, because discovery is a factor or productivity - and my discovery is 5x because my productivity is 10x.


I was doing something to graph some numbers against each other using pandas and whatnot recently.

I don't use this setup much or do this kind of work much. For the first part I stumbled through, doing it myself and referring to the docs.

Then I installed Cursor to give it a go. I did a few things in less than five minutes that would have taken me 30 minutes or more to read the docs, have a go, read stack overflow and go back and forth and code.

There is a question as to whether I would have understood things more and learnt things more deeply had I done it the old way.

But still. It's impressive. Coupled with good software engineering practice it's going to make people able to produce even more.


It's commercially useful only if your team can maintain it after you're gone though :)


Not so for consulting


The problem I have is when I ask it to do something that does not have 1 million examples on github from tutorials it will just output something completely non-functional.

I find it good for things that are completely derivative. Like, "implement a function that converts this type into another type"


Can you give some examples?


So this cursor is a fork and not an extension? Could anyone tell me why? What was not possible to achieve when implementing these features as an extension?


> What was not possible to achieve when implementing these features as an extension?

You cannot extend everything. At least at the time when Cursor started a lot of what Copilot relied on was behind beta and gated APIs / functions. It's opened up a little more now but possibly VS Code still doesn't expose everything required.


Taking an open source project and making it closed source no less. Poor taste IMO


Damn. I was so excited to try Cursor and when I set it up all my plug-ins got screwed up and go to def no longer worked. I guess I’ll try again this weekend.

I really want to code with voice to text via LLM. Why am I typing any code with my hands these days?


To be clear, the main advantages over GitHub Copilot are ability to add other files as context and being able to have multiple chats going, right?


GitHub Copilot already includes open files in the context.


But not the whole repo - Cursor indexes all your files in a vector DB and then can use RAG when querying models. Perhaps the biggest benefit I get from Cursor is that I can ask it questions about the whole repo - when working with a sizable team in a large repo, this is hugely valuable.


Is the data sent to Cursor and the underlying LLM used to train models? This is key, otherwise it's not usable for most orgs.

Copilot can use entire repositories using Copilot enterprise and it's guaranteed none of the data is used for training purposes. https://github.blog/news-insights/product-news/github-copilo...


It uses RAG, so your whole repo is indexed locally, but of course anything relevant is put into the context window in order to get results. So in that sense it's no different from any other LLM solution, and the models you use are pluggable. You can even provide you own Open AI/Anthropic/etc. API keys.

I honestly think the "oh no, enterprises are scared about where are code goes" is overblown. I mean, companies host tons of infrastructure on AWS, GCP, Azure, etc. And heck, why would a company trust GitHub (a subsidiary of Microsoft) Enterprise's guarantees and not trust Open AI's (basically a subsidiary of Microsoft at this point) guarantees?


I think I’m fully on board the Cursor hype train.

Mind you, I am coming on board from a very particular station: I am a UX designer and developer, and bouncing between Figma and our HTML/CSS/Templ template stack back and forth all day. So I am writing markup more so than code.

And after a few days of using Cursor, I’m very into it. Partly because the default color scheme and layout just feels better than any theme I’ve used in VSCode. But more so, I really like their approach to the actual UI design of the autocompletes and access to AI features. They’re making a lot of smart choices that make it easier to understand what the autocomplete is actually completing, and exposing contextual keyboard shortcuts to access more features.

It also seems to be better at picking up my UI patterns than VSCode. Maybe. I need to get some more time with it, but it really seems to pick up the typographic and spacing rhythms and patterns I’m building.


I hate VSCode so much, that having to use a form of it is a non-starter for me.


I generally prefer using vim (and have started using neovim recently). Does anyone know when/if Cursor will provide official support to plug-ins that allow me to us its functionality in neovim? I've tried VS Code a number of times, but even with vim-bindings, it doesn't feel as nice as pure vim.

I'm aware of open source alternatives like Avante.nvim [1]. I mean official support from the folks at Anysphere/Cursor, Codeium, Poolside, etc.

[1] https://news.ycombinator.com/item?id=41353835


Curious whether you've tried the VSCode Neovim extension. It is “real vim” in that it connects to a Neovim server, which means you can use your neovimrc and packages and all that. It still lives inside VSCode, so there's going to be keyboard latency and the like, but I find it really good.


My first experience with Cursor: starting it from WSL doesn't work, went down a number of GitHub rabbit holes to fix that one. Then on every update it breaks the original "code" shortcut because it has it's path with both "code" and "cursor" launchers in PATH; it ostensibly gives you the choice to register either or both, but I fail to see how that can work with this setup.

It's actually a nice tool but I'm getting a "you should be a unicorn to fork ~Chrome~ VS Code" vibe.


UX is what sets Cursor apart, all the pieces it's built on already existed, they pulled them together in a coherent way that makes coding with LLMs easier.


I’ve been using GitHub Copilot as my daily driver. It is now as indispensable as the Vim extension for VSCode. It doesn't write much new code for me and, frankly, it fails miserably at most tasks. However, it does a phenomenal job with auto-completion. So, while I still feel like I am in the driver's seat, it does help me achieve tremendously more by simply extending what I have already started.

But that should be no surprise. After all, LLMs simply complete the next token in a long sequence of text based on some probability outcome. A lot of the code is a sequence of patterns. So the LLM should be able to do well.

I feel that true coding agents are perhaps around the corner, but it seems to me that we are a couple of innovations away before this happens.

However, even with coding agents, there will be simply more people producing a lot more code (even non developers) which I believe will drive the demand for higher quality code or at least code that can be understood and proved by other humans. Thus, coding agents are just force-multipliers. Great developers will become greater. I wrote something about this here if you are interested: go.cbk.ai/divide. My $0.02.


I read you post, I am not sure I agree with:

> transforming a tool of empowerment into an amplifier of existing disparities.

Which disparity are we talking about? That of non-coders vs coders?

It used to be that someone who did not know how to code had zero chance of writing any program that runs.

Enabling these people to write code is a massive empowerment unlike any we have seen before. Hundreds of millions of people who had no chance of writing working code before will be able to do so if they wish. That's a lot of people reaching for lots of ideas. I doubt that genius and insanely productive programmer, would be able to outcompete all these competitors that previously did not exist.

I don't think we fully comprehend what this will mean in the future - but I doubt it amplifies disparity.


> Enabling these people to write code is a massive empowerment unlike any we have seen before. Hundreds of millions of people who had no chance of writing working code before will be able to do so if they wish.

They still are not able to code. They don’t understand a thing from the output. They cannot see right from wrong, or correct anything by themselves. Maybe at some occasions that kind of coding is useful, but not in professional setting.

It can be useful in learning, but so are all old school methods.

You get the most from LLMs when you are very good programmer - that is the OPs point.


These people who could have been talking to machines, now talk to AI companies who do the work for them. It's almost free today, but what will it cost once they need it to make money and they have no alternative ?


obligatory plug for aider: https://github.com/paul-gauthier/aider https://aider.chat/docs/usage.html

I run aider in a terminal, and separately review and manually code in VSCode. Usint the `--no-auto-commits` switch means that I can immediately view the diffs in the nice VSCode diff view and it's very easy to do hybrid manual and AI coding. There are plenty of handy settings (see /help), for example you can /ask questions about your code.

`aider --no-auto-commits --cache-prompts --no-stream --cache-keepalive-pings 6 --no-suggest-shell-commands`




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: